geosnap.analyze.find_region_k¶
- geosnap.analyze.find_region_k(gdf, method=None, columns=None, spatial_weights='rook', temporal_index='year', unit_index='geoid', scaler='std', weights_kwargs=None, region_kwargs=None, min_k=2, max_k=10, return_table=False)[source]¶
Brute force through cluster fit metrics to determine the optimal number of k regions
- Parameters:
- gdf
geopandas.GeoDataFrame a long-form geodataframe
- method
str, optional the clustering method to use, by default None
- columns
list, optional a list of columns in gdf to use in the clustering algorithm, by default None
- spatial_weights[‘queen’, ‘rook’] or
libpysal.weights.Wobject spatial weights matrix specification`. By default, geosnap will calculate Rook weights, but you can also pass a libpysal.weights.W object for more control over the specification.
- temporal_index
str, optional column that uniquely identifies time periods, by default “year”
- unit_index
str, optional column that uniquely identifies geographic units, by default “geoid”
- scaler
Noneorscalerfromsklearn.preprocessing, optional a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler
- cluster_kwargs
dict, optional additional kwargs passed to the clustering function in geosnap.analyze.regionalize
- max_k
int, optional maximum number of clusters to test, by default 10
- return_tablebool, optional
if True, return the table of fit metrics for each combination of k and cluster method, by default False
- gdf
- Returns:
pandas.DataFrameif return_table==False (default), returns a pandas dataframe with a single column that holds the optimal number of clusters according to each fit metric (row index).
if return_table==True, also returns a table of fit coefficients for each k between min_k and max_k