Geospatial Metadata Extraction Toolkit Documentation¶
Detailed overview of the earthstat.geo_meta_extractors
module, offering specialized tools for extracting metadata from various geospatial data formats.
Metadata Extraction Capabilities¶
Extracting Predictor Variables from Metadata¶
predictorMeta(predictor_dir, predictor_name)
¶
Generates a summary of TIFF files within a specified directory, providing
essential metadata about the geospatial data contained in these files.
This function specifically looks for .tif
files, extracting dates, spatial
resolution, Coordinate Reference System (CRS), and other relevant metadata.
The summary also includes: total number of TIFF files, the directory path, CRS, spatial extent, data type, NoData value, spatial resolution in pixels, and pixel size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predictor_dir |
str |
The path to the directory containing TIFF files.
This directory is expected to exist and contain at least one |
required |
predictor_name |
str |
A descriptive name for the predictor. This name is used purely for identification purposes in the summary output. |
required |
Exceptions:
Type | Description |
---|---|
FileNotFoundError |
If |
Returns:
Type | Description |
---|---|
dict |
A dictionary containing extracted metadata. |
Source code in earthstat/geo_meta_extractors/predictor_meta.py
def predictorMeta(predictor_dir, predictor_name):
"""
Generates a summary of TIFF files within a specified directory, providing
essential metadata about the geospatial data contained in these files.
This function specifically looks for `.tif` files, extracting dates, spatial
resolution, Coordinate Reference System (CRS), and other relevant metadata.
The summary also includes: total number of TIFF files, the directory path, CRS, spatial extent, data type,
NoData value, spatial resolution in pixels, and pixel size.
Args:
predictor_dir (str): The path to the directory containing TIFF files.
This directory is expected to exist and contain at least one `.tif` file.
predictor_name (str): A descriptive name for the predictor. This name is
used purely for identification purposes in the summary output.
Raises:
FileNotFoundError: If `predictor_dir` does not exist or contains no `.tif` files.
Returns:
dict: A dictionary containing extracted metadata.
"""
if not os.path.exists(predictor_dir):
raise FileNotFoundError(
f"The directory {predictor_dir} does not exist.")
paths = glob.glob(os.path.join(predictor_dir, '*.tif'))
if not paths:
return "No TIFF files found. Please ensure the directory is correct and contains TIFF files."
dates = [convDate(exDate(os.path.basename(path))) for path in paths]
date_range = f"{min(dates)} to {max(dates)}" if dates else "No identifiable dates."
with rasterio.open(paths[0]) as src:
width, height = src.width, src.height
crs = CRS(src.crs).name
predictor_summary = {
"predictor": predictor_name,
"total_tiff_files": len(paths),
"date_range": date_range,
"directory": predictor_dir,
"CRS": crs,
"Extent": src.bounds,
"Data Type": src.dtypes[0],
"NoData Value": src.nodatavals[0],
"Spatial Resolution": f"{width}x{height}",
"Pixel Size": src.res
}
print("Predictor Summary:\n")
print('\n'.join(f"{key}: {value}" for key,
value in predictor_summary.items()))
return predictor_summary
Generating Metadata Masks for Geospatial Analysis¶
maskSummary(raster_path)
¶
Generates a summary of a single-band raster file, including CRS, extent, data type, NoData value, resolution, pixel size, and min/max values. Assumes the file is readable by rasterio and contains geospatial data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raster_path |
str |
Path to the raster file. |
required |
Returns:
Type | Description |
---|---|
dict |
Summary of raster properties. Includes 'Mask_path', 'CRS', 'Extent', 'Data Type', 'NoData Value', 'Spatial Resolution', 'Pixel Size', and 'Min/Max Value'. |
Source code in earthstat/geo_meta_extractors/mask_meta.py
def maskSummary(raster_path):
"""
Generates a summary of a single-band raster file, including CRS, extent, data type,
NoData value, resolution, pixel size, and min/max values. Assumes the file is readable
by rasterio and contains geospatial data.
Args:
raster_path (str): Path to the raster file.
Returns:
dict: Summary of raster properties. Includes 'Mask_path', 'CRS', 'Extent',
'Data Type', 'NoData Value', 'Spatial Resolution', 'Pixel Size',
and 'Min/Max Value'.
"""
with rasterio.open(raster_path) as src:
# Assuming there is a single band
band_data = src.read(1, masked=True)
# Compute minimum and maximum values
min_value = band_data.min()
max_value = band_data.max()
crs = CRS(src.crs).name
# Extracting essential information
mask_summary = {
"Mask_path": raster_path,
"CRS": crs,
"Extent": src.bounds,
"Data Type": src.dtypes[0],
"NoData Value": src.nodatavals[0],
"Spatial Resolution": (src.width, src.height),
"Pixel Size": src.res,
"Min/Max Value": (min_value, max_value)
}
print("Mask Summary:\n")
print('\n'.join(f"{key}: {value}" for key,
value in mask_summary.items()))
return mask_summary
Shapefile Metadata Extraction for Enhanced Data Insight¶
shapefileMeta(shapefile_path)
¶
Summarizes key metadata of a shapefile, including geometry types, CRS, extent, feature count, and attribute names. Assumes the shapefile can be read using GeoPandas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shapefile_path |
str |
Path to the shapefile. |
required |
Returns:
Type | Description |
---|---|
dict |
Contains 'Geometry Type', 'Coordinate Reference System (CRS)', 'Extent', 'Feature Count', and 'Attributes' of the shapefile. |
Source code in earthstat/geo_meta_extractors/shapefile_meta.py
def shapefileMeta(shapefile_path):
"""
Summarizes key metadata of a shapefile, including geometry types, CRS, extent,
feature count, and attribute names. Assumes the shapefile can be read using
GeoPandas.
Args:
shapefile_path (str): Path to the shapefile.
Returns:
dict: Contains 'Geometry Type', 'Coordinate Reference System (CRS)', 'Extent',
'Feature Count', and 'Attributes' of the shapefile.
"""
# Load the shapefile
gdf = gpd.read_file(shapefile_path)
crs = CRS(gdf.crs).name
# Extracting essential information
shapefile_meta = {
"Geometry Type": gdf.geometry.type.unique(),
"Coordinate Reference System (CRS)": crs,
"Extent": gdf.total_bounds,
"Feature Count": len(gdf),
"Attributes": list(gdf.columns)
}
print("Shapefile Summary:\n")
print('\n'.join(f"{key}: {value}" for key,
value in shapefile_meta.items()))
return shapefile_meta