Welcom to EarthStat¶
Welcome to EarthStat, your comprehensive tool for extracting geographical and statistical data. This notebook is designed to guide you through the initial setup, data preparation, and the various functionalities available in the EarthStat library.
Installation¶
!pip install earthstat
from earthstat import EarthStat
User Configuration For Extracting Statistical Info¶
Initialize the core settings:
predictor_name
: The name of the predictor being used.predictor_dir
: The directory where the predictor's related files are stored.mask_file_path
: The file path to the mask file, used for calculate weighted mean or mask the raster.shapefile_file_path
: Path to the shapefile containing geographical boundaries.selected_countries
: A list of countries - Region of interest (ROI).country_column_name
: The column's name in the dataset that contains country names.invalid_values
: A list of values considered invalid within the dataset.
Important: Be sure to set
invalid_values
toNone
if you do not wish to exclude any values from the dataset's rasters.
predictor_name = 'FPAR'
predictor_dir = 'FPAR'
mask_file_path = 'crop mask/asap_mask_crop_v04.tif'
shapefile_file_path = 'shapefile/gaul1_asap.shp'
selected_countries = ["Norway", "Spain"]
country_column_name = 'adm0_name'
invalid_values = [255, 254, 251]
Important To know that:
Caution: An increase in ROI size may lead to system crashes for normal processing due to insuffienct RAM size.
fpar_aggregator = EarthStat(predictor_name)
Data initialization¶
Set up the foundational paths for your data processing pipeline. This includes initializing the directory for the predictor data, the path for the mask file, and the location of the shapefile. Each step is crucial for ensuring that the subsequent data processing and analysis can proceed smoothly.
fpar_aggregator.initDataDir(predictor_dir)
TIFF data found. Loading... Predictor Summary: predictor: FPAR total_tiff_files: 2 date_range: 2015-07-21 to 2015-08-01 directory: FPAR CRS: WGS 84 Extent: BoundingBox(left=-180.004464285715, bottom=-56.00446430667502, right=179.99553577188502, top=75.004464285715) Data Type: uint8 NoData Value: 255.0 Spatial Resolution: 80640x29346 Pixel Size: (0.004464285715, 0.004464285715) Predictor Paths Initialized Correctly, Initialize The Mask's Path
fpar_aggregator.initMaskPath(mask_file_path)
Mask Summary: Mask_path: crop mask/asap_mask_crop_v04.tif CRS: WGS 84 Extent: BoundingBox(left=-180.004464285715, bottom=-56.00446430667502, right=179.99553577188502, top=75.004464285715) Data Type: uint8 NoData Value: None Spatial Resolution: (80640, 29346) Pixel Size: (0.004464285715, 0.004464285715) Min/Max Value: (0, 100) Mask Initialized Correctly, Initialize The Shapefile
fpar_aggregator.initShapefilePath(shapefile_file_path)
Shapefile Summary: Geometry Type: ['MultiPolygon' None 'Polygon'] Coordinate Reference System (CRS): WGS 84 Extent: [-180. -55.7948999 180. 83.62741852] Feature Count: 2368 Attributes: ['adm1_code', 'adm1_name', 'adm0_code', 'adm0_name', 'adm0_name_', 'adm1_name_', 'asap1_id', 'asap0_id', 'geometry'] Shapefile Initialized Correctly, You Can Check The Data Compatibility
Check Data Compatibility and Fix Data Compatibility Issues¶
Evaluate the compatibility of projections and pixel sizes across the mask, raster, and shapefile to ensure seamless data integration. This check confirms that the projection systems align for the mask, raster, and shapefile, and it also verifies that the pixel sizes between the raster and mask are compatible.
fpar_aggregator.DataCompatibility()
NO ISSUE: The pixel sizes of the mask and predictor are identical: (0.004464285715, 0.004464285715) NO ISSUE: The projections of the mask and predictor are identical: WGS 84 NO ISSUE: The projections of the raster data and shapefile are identical: WGS 84 COMPATIBILITY CHECK PASSED: The data is compatible. No resolution or projection mismatches were detected.
Resolving Data Compatibility Issues¶
This section addresses how to rectify issues identified by the data compatibility check. It focuses on resolving mismatches in pixel size between the raster and mask, or discrepancies in the Coordinate Reference System (CRS) among the raster, mask, and shapefile. The objective is to ensure uniformity in scale, resolution, and geospatial alignment across all datasets involved in the analysis.
Paramaters:
rescale_factor
: This parameter allows for the adjustment of the data's scale. By default, it is set toNone
, maintaining the original scale of the data. To alter the scale, specify a new range with a tuple, such as(0,100)
.resampling_method
: This specifies the technique used to resample the data, with options including"nearest"
,"bilinear"
,"cubic"
, and"average"
. The default method is"bilinear"
, suitable for a wide range of applications.
fpar_aggregator.fixCompatibilityIssues(rescale_factor=None, # None = Rescale OFF
resampling_method="bilinear") # Defualt Bilinear
Checking for compatibility issues... No compatibility issues detected. Predictor, mask, and shapefile are already compatible.
Selecting Region of Interest (ROI) - Filter Shapefile¶
Specify the area for data analysis by identifying the region of interest. Configure the target ROI and link it to the corresponding column that designates country or area names within the dataset.
fpar_aggregator.selectRegionOfInterest(selected_countries,
country_column_name)
Filtered shapefile saved to: shapefile/filtered_gaul1_asap.shp Region of Interest (ROI) successfully selected based on the specified countries: Norway, Spain.
Clipping Predictor Data¶
Clip the predictor data to the boundaries defined in the main shapefile.
%time fpar_aggregator.clipPredictor()
Clipping the predictor data...
Clipping Rasters: 100%|██████████| 2/2 [00:08<00:00, 4.06s/it]
Clipping operation successful with the Region of Interest (ROI). CPU times: total: 8.02 s Wall time: 8.13 s
Note & Caution: The Function is a multiprocessing process. Using the main shapefile without filtering may led to system crash or error due to the big amount of geometry objects in original shapefile.
Executing Data Aggregation¶
Start data aggregation process, leveraging the predictor data, mask, and the filtered shapefile.
Parameters¶
use_mask
(bool): Specifies whether to apply a mask to the raster data. When set toTrue
, the function will use the mask path provided (if applicable) to only process areas within the mask. Default isFalse
.invalid_values
(list of int): A list of pixel values to be treated as invalid and excluded from the aggregation. For example,[255, 254, 251]
can be used to ignore certain values that represent no data or errors in the raster files.calculation_mode
(str): Determines the mode of aggregation for pixel values. Supported modes include:"overall_mean"
: Calculates the mean of all valid pixel values across the raster dataset."weighted_mean"
: Calculates the weighted mean of the valid pixel values using the mask values as weights. This mode is applicable only whenuse_mask
isTrue
and a validmask_path
is provided."filtered_mean"
: Applies a filter using the validated mask values to mask the data before calculating the mean. This mode is intended for scenarios where only specific parts of the raster that meet certain conditions (defined by the mask) should contribute to the mean calculation.
all_touched
(bool): If set toTrue
, all pixels touched by geometries will be included in the mask. IfFalse
, only pixels whose center is within the geometry or touching the geometry boundary will be included. Default isFalse
.
# Mask On
use_mask=True
calculation_mode="weighted_mean"
all_touched=False
%time fpar_aggregator.runAggregation(use_mask, invalid_values, calculation_mode, all_touched)
Starting aggregation... Starting aggregation with the selected Region of Interest (ROI) for FPAR.
Processing rasters: 100%|██████████| 2/2 [00:05<00:00, 2.72s/raster]
Aggregation complete. Data saved to Aggregated_FPAR.csv. CPU times: total: 5.92 s Wall time: 5.9 s
Parallel Processing with runParallelAggregation
¶
The runParallelAggregation
method is designed to process and aggregate raster data across multiple files in parallel, enhancing performance for large datasets. This method leverages multiple CPU cores to simultaneously process different portions of the data, reducing overall computation time.
# Mask Off
use_mask=False
calculation_mode="overall_mean"
all_touched=False
%time fpar_aggregator.runParallelAggregation(use_mask, invalid_values, calculation_mode, all_touched)
Starting aggregation... Starting aggregation with the selected Region of Interest (ROI) for FPAR.
Processing rasters: 100%|██████████| 2/2 [00:02<00:00, 1.33s/raster]
Aggregation complete. Data saved to Aggregated_overall_mean_FPAR_20240315_162031.csv. CPU times: total: 422 ms Wall time: 3.2 s
# Mask On
use_mask=True
calculation_mode="weighted_mean"
all_touched=False
%time fpar_aggregator.runParallelAggregation(use_mask, invalid_values, calculation_mode, all_touched)
Starting aggregation... Starting aggregation with the selected Region of Interest (ROI) for FPAR.
Processing rasters: 100%|██████████| 2/2 [00:04<00:00, 2.33s/raster]
Aggregation complete. Data saved to Aggregated_weighted_mean_FPAR_20240315_162034.csv. CPU times: total: 422 ms Wall time: 5.19 s
Caution:
runAggregation
andrunParallelAggregation
function combine clipping and aggregation, so it does not save clipped raster, if you did not clip.