Skip to content

Data Compatibility

Explore the earthstat.data_compatibility module designed to ensure and enhance compatibility across different data sets and formats used in geospatial analysis.

Ensuring Data Compatibility

Checking Data Compatibility

checkDataCompatibility(raster_data_path, mask_path, shapefile_path)

Checks spatial resolution and CRS compatibility among a raster dataset, mask, and shapefile.

Determines if the mask needs resampling to match the raster dataset's resolution or if the shapefile needs reprojecting to match the raster's CRS. Identifies overall data compatibility.

Parameters:

Name Type Description Default
raster_data_path str

Path to the raster dataset file.

required
mask_path str

Path to the mask file.

required
shapefile_path str

Path to the shapefile.

required

Returns:

Type Description
dict

A dictionary indicating required actions (resample_mask, reproject_shapefile) and overall compatibility (is_compatible).

Exceptions:

Type Description
CRSError

If there's an issue reading the CRS data from any file.

Exception

For general errors encountered during processing.

Source code in earthstat/data_compatibility/data_compatibility.py
def checkDataCompatibility(raster_data_path, mask_path, shapefile_path):
    """
    Checks spatial resolution and CRS compatibility among a raster dataset, mask, and shapefile.

    Determines if the mask needs resampling to match the raster dataset's resolution or if the shapefile
    needs reprojecting to match the raster's CRS. Identifies overall data compatibility.

    Args:
        raster_data_path (str): Path to the raster dataset file.
        mask_path (str): Path to the mask file.
        shapefile_path (str): Path to the shapefile.

    Returns:
        dict: A dictionary indicating required actions (resample_mask, reproject_shapefile) and 
              overall compatibility (is_compatible).

    Raises:
        CRSError: If there's an issue reading the CRS data from any file.
        Exception: For general errors encountered during processing.
    """
    actions = {'resample_mask': False,
               'reproject_shapefile': False, 'is_compatible': True}

    try:
        with rasterio.open(mask_path) as mask, rasterio.open(raster_data_path) as raster_data:
            checkPixelSize(mask, raster_data)
            if mask.res != raster_data.res:
                actions['resample_mask'] = True
                actions['is_compatible'] = False

            mask_crs_name = CRS(mask.crs).name
            raster_data_crs_name = CRS(raster_data.crs).name
            checkProjection(mask_crs_name, raster_data_crs_name,
                            "mask", "predictor")
            if mask_crs_name != raster_data_crs_name:
                actions['is_compatible'] = False

        shapefile = gpd.read_file(shapefile_path)
        shapefile_crs_name = CRS(shapefile.crs).name
        checkProjection(raster_data_crs_name, shapefile_crs_name,
                        "raster data", "shapefile")
        if raster_data_crs_name != shapefile_crs_name:
            actions['reproject_shapefile'] = True
            actions['is_compatible'] = False

        return actions
    except CRSError as e:
        print(f"Error reading CRS data: {e}")
        return actions
    except Exception as e:
        print(f"An error occurred: {e}")
        return actions

Resolving Compatibility Issues

Addressing and Fixing Data Compatibility Issues

processCompatibilityIssues(actions, mask_path, predictor_data_path, shapefile_path, rescale_factor=None, resampling_method='bilinear')

Processes identified compatibility issues by resampling masks and/or reprojection of shapefiles to match a predictor dataset's specifications.

Parameters:

Name Type Description Default
actions dict

A dictionary indicating which compatibility actions are required.

required
mask_path str

Path to the original mask file.

required
predictor_data_path str

Path to the raster dataset used as the predictor.

required
shapefile_path str

Path to the original shapefile.

required
rescale_factor tuple

Min and max values for rescaling the mask data.

None
resampling_method str

Method to use for resampling ('bilinear' by default).

'bilinear'

Returns:

Type Description
dict

Updated paths for the processed mask and shapefile.

Performs resampling of the mask and reprojection of the shapefile based on the actions specified in the actions dictionary. Returns updated file paths for these processed files.

Source code in earthstat/data_compatibility/process_comp_issues.py
def processCompatibilityIssues(actions, mask_path, predictor_data_path, shapefile_path, rescale_factor=None, resampling_method="bilinear"):
    """
    Processes identified compatibility issues by resampling masks and/or reprojection of shapefiles
    to match a predictor dataset's specifications.

    Args:
        actions (dict): A dictionary indicating which compatibility actions are required.
        mask_path (str): Path to the original mask file.
        predictor_data_path (str): Path to the raster dataset used as the predictor.
        shapefile_path (str): Path to the original shapefile.
        rescale_factor (tuple, optional): Min and max values for rescaling the mask data.
        resampling_method (str): Method to use for resampling ('bilinear' by default).

    Returns:
        dict: Updated paths for the processed mask and shapefile.

    Performs resampling of the mask and reprojection of the shapefile based on the actions specified
    in the `actions` dictionary. Returns updated file paths for these processed files.
    """
    updated_paths = {
        'crop_mask': mask_path,
        'shapefile': shapefile_path
    }

    if not actions['is_compatible']:
        if actions['resample_mask']:
            updated_paths['crop_mask'] = rescaleResampleMask(mask_path,
                                                             predictor_data_path,
                                                             scale_factor=rescale_factor,
                                                             resampling_method=resampling_method)

        if actions['reproject_shapefile']:
            print("\nReprojecting shapefile...")
            updated_paths['shapefile'] = reprojectShapefileToRaster(
                predictor_data_path, shapefile_path)

    else:
        print("No compatibility issues detected. Proceeding without resampling or reprojection.")

    return updated_paths