Skip to content

Geospatial Metadata Extraction Toolkit Documentation

Detailed overview of the earthstat.geo_meta_extractors module, offering specialized tools for extracting metadata from various geospatial data formats.

Metadata Extraction Capabilities

Extracting Predictor Variables from Metadata

predictorMeta(predictor_dir, predictor_name)

Generates a summary of TIFF files within a specified directory, providing essential metadata about the geospatial data contained in these files. This function specifically looks for .tif files, extracting dates, spatial resolution, Coordinate Reference System (CRS), and other relevant metadata.

The summary also includes: total number of TIFF files, the directory path, CRS, spatial extent, data type, NoData value, spatial resolution in pixels, and pixel size.

Parameters:

Name Type Description Default
predictor_dir str

The path to the directory containing TIFF files. This directory is expected to exist and contain at least one .tif file.

required
predictor_name str

A descriptive name for the predictor. This name is used purely for identification purposes in the summary output.

required

Exceptions:

Type Description
FileNotFoundError

If predictor_dir does not exist or contains no .tif files.

Returns:

Type Description
dict

A dictionary containing extracted metadata.

Source code in earthstat/geo_meta_extractors/predictor_meta.py
def predictorMeta(predictor_dir, predictor_name):
    """
    Generates a summary of TIFF files within a specified directory, providing
    essential metadata about the geospatial data contained in these files.
    This function specifically looks for `.tif` files, extracting dates, spatial
    resolution, Coordinate Reference System (CRS), and other relevant metadata.

    The summary also includes: total number of TIFF files, the directory path, CRS, spatial extent, data type,
    NoData value, spatial resolution in pixels, and pixel size.

    Args:
        predictor_dir (str): The path to the directory containing TIFF files.
            This directory is expected to exist and contain at least one `.tif` file.
        predictor_name (str): A descriptive name for the predictor. This name is
            used purely for identification purposes in the summary output.

    Raises:
        FileNotFoundError: If `predictor_dir` does not exist or contains no `.tif` files.

    Returns:
        dict: A dictionary containing extracted metadata.
    """
    if not os.path.exists(predictor_dir):

        raise FileNotFoundError(
            f"The directory {predictor_dir} does not exist.")

    paths = glob.glob(os.path.join(predictor_dir, '*.tif'))
    if not paths:
        return "No TIFF files found. Please ensure the directory is correct and contains TIFF files."

    dates = [convDate(exDate(os.path.basename(path))) for path in paths]
    date_range = f"{min(dates)} to {max(dates)}" if dates else "No identifiable dates."

    with rasterio.open(paths[0]) as src:
        width, height = src.width, src.height
        crs = CRS(src.crs).name

    predictor_summary = {
        "predictor": predictor_name,
        "total_tiff_files": len(paths),
        "date_range": date_range,
        "directory": predictor_dir,
        "CRS": crs,
        "Extent": src.bounds,
        "Data Type": src.dtypes[0],
        "NoData Value": src.nodatavals[0],
        "Spatial Resolution": f"{width}x{height}",
        "Pixel Size": src.res
    }
    print("Predictor Summary:\n")
    print('\n'.join(f"{key}: {value}" for key,
          value in predictor_summary.items()))
    return predictor_summary

Generating Metadata Masks for Geospatial Analysis

maskSummary(raster_path)

Generates a summary of a single-band raster file, including CRS, extent, data type, NoData value, resolution, pixel size, and min/max values. Assumes the file is readable by rasterio and contains geospatial data.

Parameters:

Name Type Description Default
raster_path str

Path to the raster file.

required

Returns:

Type Description
dict

Summary of raster properties. Includes 'Mask_path', 'CRS', 'Extent', 'Data Type', 'NoData Value', 'Spatial Resolution', 'Pixel Size', and 'Min/Max Value'.

Source code in earthstat/geo_meta_extractors/mask_meta.py
def maskSummary(raster_path):
    """
    Generates a summary of a single-band raster file, including CRS, extent, data type, 
    NoData value, resolution, pixel size, and min/max values. Assumes the file is readable 
    by rasterio and contains geospatial data.

    Args:
        raster_path (str): Path to the raster file.

    Returns:
        dict: Summary of raster properties. Includes 'Mask_path', 'CRS', 'Extent', 
              'Data Type', 'NoData Value', 'Spatial Resolution', 'Pixel Size', 
              and 'Min/Max Value'.
    """
    with rasterio.open(raster_path) as src:

        # Assuming there is a single band
        band_data = src.read(1, masked=True)

        # Compute minimum and maximum values
        min_value = band_data.min()
        max_value = band_data.max()
        crs = CRS(src.crs).name

        # Extracting essential information
        mask_summary = {
            "Mask_path": raster_path,
            "CRS": crs,
            "Extent": src.bounds,
            "Data Type": src.dtypes[0],
            "NoData Value": src.nodatavals[0],
            "Spatial Resolution": (src.width, src.height),
            "Pixel Size": src.res,
            "Min/Max Value": (min_value, max_value)
        }

        print("Mask Summary:\n")
        print('\n'.join(f"{key}: {value}" for key,
              value in mask_summary.items()))

        return mask_summary

Shapefile Metadata Extraction for Enhanced Data Insight

shapefileMeta(shapefile_path)

Summarizes key metadata of a shapefile, including geometry types, CRS, extent, feature count, and attribute names. Assumes the shapefile can be read using GeoPandas.

Parameters:

Name Type Description Default
shapefile_path str

Path to the shapefile.

required

Returns:

Type Description
dict

Contains 'Geometry Type', 'Coordinate Reference System (CRS)', 'Extent', 'Feature Count', and 'Attributes' of the shapefile.

Source code in earthstat/geo_meta_extractors/shapefile_meta.py
def shapefileMeta(shapefile_path):
    """
    Summarizes key metadata of a shapefile, including geometry types, CRS, extent,
    feature count, and attribute names. Assumes the shapefile can be read using
    GeoPandas.

    Args:
        shapefile_path (str): Path to the shapefile.

    Returns:
        dict: Contains 'Geometry Type', 'Coordinate Reference System (CRS)', 'Extent',
              'Feature Count', and 'Attributes' of the shapefile.
    """
    # Load the shapefile
    gdf = gpd.read_file(shapefile_path)
    crs = CRS(gdf.crs).name

    # Extracting essential information
    shapefile_meta = {
        "Geometry Type": gdf.geometry.type.unique(),
        "Coordinate Reference System (CRS)": crs,
        "Extent": gdf.total_bounds,
        "Feature Count": len(gdf),
        "Attributes": list(gdf.columns)
    }
    print("Shapefile Summary:\n")
    print('\n'.join(f"{key}: {value}" for key,
                    value in shapefile_meta.items()))

    return shapefile_meta