Data Compatibility¶
Explore the earthstat.data_compatibility
module designed to ensure and enhance compatibility across different data sets and formats used in geospatial analysis.
Ensuring Data Compatibility¶
Checking Data Compatibility¶
checkDataCompatibility(raster_data_path, mask_path, shapefile_path)
¶
Checks spatial resolution and CRS compatibility among a raster dataset, mask, and shapefile.
Determines if the mask needs resampling to match the raster dataset's resolution or if the shapefile needs reprojecting to match the raster's CRS. Identifies overall data compatibility.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raster_data_path |
str |
Path to the raster dataset file. |
required |
mask_path |
str |
Path to the mask file. |
required |
shapefile_path |
str |
Path to the shapefile. |
required |
Returns:
Type | Description |
---|---|
dict |
A dictionary indicating required actions (resample_mask, reproject_shapefile) and overall compatibility (is_compatible). |
Exceptions:
Type | Description |
---|---|
CRSError |
If there's an issue reading the CRS data from any file. |
Exception |
For general errors encountered during processing. |
Source code in earthstat/data_compatibility/data_compatibility.py
def checkDataCompatibility(raster_data_path, mask_path, shapefile_path):
"""
Checks spatial resolution and CRS compatibility among a raster dataset, mask, and shapefile.
Determines if the mask needs resampling to match the raster dataset's resolution or if the shapefile
needs reprojecting to match the raster's CRS. Identifies overall data compatibility.
Args:
raster_data_path (str): Path to the raster dataset file.
mask_path (str): Path to the mask file.
shapefile_path (str): Path to the shapefile.
Returns:
dict: A dictionary indicating required actions (resample_mask, reproject_shapefile) and
overall compatibility (is_compatible).
Raises:
CRSError: If there's an issue reading the CRS data from any file.
Exception: For general errors encountered during processing.
"""
actions = {'resample_mask': False,
'reproject_shapefile': False, 'is_compatible': True}
try:
with rasterio.open(mask_path) as mask, rasterio.open(raster_data_path) as raster_data:
checkPixelSize(mask, raster_data)
if mask.res != raster_data.res:
actions['resample_mask'] = True
actions['is_compatible'] = False
mask_crs_name = CRS(mask.crs).name
raster_data_crs_name = CRS(raster_data.crs).name
checkProjection(mask_crs_name, raster_data_crs_name,
"mask", "predictor")
if mask_crs_name != raster_data_crs_name:
actions['is_compatible'] = False
shapefile = gpd.read_file(shapefile_path)
shapefile_crs_name = CRS(shapefile.crs).name
checkProjection(raster_data_crs_name, shapefile_crs_name,
"raster data", "shapefile")
if raster_data_crs_name != shapefile_crs_name:
actions['reproject_shapefile'] = True
actions['is_compatible'] = False
return actions
except CRSError as e:
print(f"Error reading CRS data: {e}")
return actions
except Exception as e:
print(f"An error occurred: {e}")
return actions
Resolving Compatibility Issues¶
Addressing and Fixing Data Compatibility Issues¶
processCompatibilityIssues(actions, mask_path, predictor_data_path, shapefile_path, rescale_factor=None, resampling_method='bilinear')
¶
Processes identified compatibility issues by resampling masks and/or reprojection of shapefiles to match a predictor dataset's specifications.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actions |
dict |
A dictionary indicating which compatibility actions are required. |
required |
mask_path |
str |
Path to the original mask file. |
required |
predictor_data_path |
str |
Path to the raster dataset used as the predictor. |
required |
shapefile_path |
str |
Path to the original shapefile. |
required |
rescale_factor |
tuple |
Min and max values for rescaling the mask data. |
None |
resampling_method |
str |
Method to use for resampling ('bilinear' by default). |
'bilinear' |
Returns:
Type | Description |
---|---|
dict |
Updated paths for the processed mask and shapefile. |
Performs resampling of the mask and reprojection of the shapefile based on the actions specified
in the actions
dictionary. Returns updated file paths for these processed files.
Source code in earthstat/data_compatibility/process_comp_issues.py
def processCompatibilityIssues(actions, mask_path, predictor_data_path, shapefile_path, rescale_factor=None, resampling_method="bilinear"):
"""
Processes identified compatibility issues by resampling masks and/or reprojection of shapefiles
to match a predictor dataset's specifications.
Args:
actions (dict): A dictionary indicating which compatibility actions are required.
mask_path (str): Path to the original mask file.
predictor_data_path (str): Path to the raster dataset used as the predictor.
shapefile_path (str): Path to the original shapefile.
rescale_factor (tuple, optional): Min and max values for rescaling the mask data.
resampling_method (str): Method to use for resampling ('bilinear' by default).
Returns:
dict: Updated paths for the processed mask and shapefile.
Performs resampling of the mask and reprojection of the shapefile based on the actions specified
in the `actions` dictionary. Returns updated file paths for these processed files.
"""
updated_paths = {
'crop_mask': mask_path,
'shapefile': shapefile_path
}
if not actions['is_compatible']:
if actions['resample_mask']:
updated_paths['crop_mask'] = rescaleResampleMask(mask_path,
predictor_data_path,
scale_factor=rescale_factor,
resampling_method=resampling_method)
if actions['reproject_shapefile']:
print("\nReprojecting shapefile...")
updated_paths['shapefile'] = reprojectShapefileToRaster(
predictor_data_path, shapefile_path)
else:
print("No compatibility issues detected. Proceeding without resampling or reprojection.")
return updated_paths