geosnap.analyze.find_region_k¶
- geosnap.analyze.find_region_k(gdf, method=None, columns=None, spatial_weights='rook', temporal_index='year', unit_index='geoid', scaler='std', weights_kwargs=None, region_kwargs=None, min_k=2, max_k=10, return_table=False)[source]¶
Brute force through cluster fit metrics to determine the optimal number of k regions
- Parameters:
- gdf
geopandas.GeoDataFrame
a long-form geodataframe
- method
str
, optional the clustering method to use, by default None
- columns
list
, optional a list of columns in gdf to use in the clustering algorithm, by default None
- spatial_weights[‘queen’, ‘rook’] or
libpysal.weights.W
object
spatial weights matrix specification`. By default, geosnap will calculate Rook weights, but you can also pass a libpysal.weights.W object for more control over the specification.
- temporal_index
str
, optional column that uniquely identifies time periods, by default “year”
- unit_index
str
, optional column that uniquely identifies geographic units, by default “geoid”
- scaler
None
orscaler
fromsklearn.preprocessing
, optional a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler
- cluster_kwargs
dict
, optional additional kwargs passed to the clustering function in geosnap.analyze.regionalize
- max_k
int
, optional maximum number of clusters to test, by default 10
- return_tablebool, optional
if True, return the table of fit metrics for each combination of k and cluster method, by default False
- gdf
- Returns:
pandas.DataFrame
if return_table==False (default), returns a pandas dataframe with a single column that holds the optimal number of clusters according to each fit metric (row index).
if return_table==True, also returns a table of fit coefficients for each k between min_k and max_k