geosnap.harmonize.harmonize¶
- geosnap.harmonize.harmonize(gdf, target_year=None, target_gdf=None, weights_method='area', extensive_variables=None, intensive_variables=None, allocate_total=True, raster=None, pixel_values=None, temporal_index='year', unit_index=None, verbose=False)[source]¶
Use spatial interpolation to standardize neighborhood boundaries over time.
- Parameters:
- gdf
geopandas.GeoDataFrame
Long-form geodataframe with a column that holds unique time periods represented by temporal_index
- target_year
str
The target time period whose boundaries form the target, i.e. the boundaries in which all other time periods should be expressed (Optional).
- unit_index
str
, optional the column on the geodataframe that identifies unique units in the timeseries. If None, the geodataframe index will be used, and the unique identifier for each unit will be set to “id”
- target_gdf: geopandas.GeoDataFrame
A geodataframe whose boundaries are the interpolation target for all time periods. For example to convert all time periods to a set of hexgrids, generate a set of hexagonal polygons using tobler <https://pysal.org/tobler/generated/tobler.util.h3fy.htm> and pass the resulting geodataframe as this argument. (Optional).
- weights_method
str
- The method that the harmonization will be conducted. This can be set to:
“area” : harmonization using simple area-weighted interprolation.
- “dasymetric”harmonization using area-weighted interpolation with raster-based
ancillary data to mask out uninhabited land.
- extensive_variables
list
The names of variables in each dataset of gdf that contains extensive variables to be harmonized (see (2) in Notes).
- intensive_variables
list
The names of variables in each dataset of gdf that contains intensive variables to be harmonized (see (2) in Notes).
- allocate_totalbool
True if total value of source area should be allocated. False if denominator is area of i. Note that the two cases would be identical when the area of the source polygon is exhausted by intersections. See (3) in Notes for more details.
- raster
str
the path to a local raster image to be used as a dasymetric mask. If using “dasymetric” this is a required argument.
- codes
list
ofints
list of raster pixel values that should be considered as ‘populated’. Since this draw inspiration using the National Land Cover Database (NLCD), the default is 21 (Developed, Open Space), 22 (Developed, Low Intensity), 23 (Developed, Medium Intensity) and 24 (Developed, High Intensity). The description of each code can be found here: https://www.mrlc.gov/sites/default/files/metadata/landcover.html Ignored if not using dasymetric harmonizatiton.
- force_crs_matchbool.
Default
is
True. Wheter the Coordinate Reference System (CRS) of the polygon will be reprojected to the CRS of the raster file. It is recommended to leave this argument True. Only taken into consideration for harmonization raster based.
- verbose: bool
whether to print warnings (usually NaN replacement warnings) from tobler default is False
- gdf
Notes
Each GeoDataFrame of raw_community is assumed to have a ‘year’ column Also, all GeoDataFrames must have the same Coordinate Reference System (CRS).
2) A quick explanation of extensive and intensive variables can be found here: https://www.esri.com/about/newsroom/arcuser/understanding-statistical-data-for-mapping-purposes/
For an extensive variable, the estimate at target polygon j (default case) is:
v_j = sum_i v_i w_{i,j}
w_{i,j} = a_{i,j} / sum_k a_{i,k}
If the area of the source polygon is not exhausted by intersections with target polygons and there is reason to not allocate the complete value of an extensive attribute, then setting allocate_total=False will use the following weights:
v_j = sum_i v_i w_{i,j}
w_{i,j} = a_{i,j} / a_i
where a_i is the total area of source polygon i.
For an intensive variable, the estimate at target polygon j is:
v_j = sum_i v_i w_{i,j}
w_{i,j} = a_{i,j} / sum_k a_{k,j}