geosnap.analyze.find_region_k

geosnap.analyze.find_region_k(gdf, method=None, columns=None, spatial_weights='rook', temporal_index='year', unit_index='geoid', scaler='std', weights_kwargs=None, region_kwargs=None, min_k=2, max_k=10, return_table=False)[source]

Brute force through cluster fit metrics to determine the optimal number of k regions

Parameters:
gdfgeopandas.GeoDataFrame

a long-form geodataframe

methodstr, optional

the clustering method to use, by default None

columnslist, optional

a list of columns in gdf to use in the clustering algorithm, by default None

spatial_weights[‘queen’, ‘rook’] or libpysal.weights.W object

spatial weights matrix specification`. By default, geosnap will calculate Rook weights, but you can also pass a libpysal.weights.W object for more control over the specification.

temporal_indexstr, optional

column that uniquely identifies time periods, by default “year”

unit_indexstr, optional

column that uniquely identifies geographic units, by default “geoid”

scalerNone or scaler from sklearn.preprocessing, optional

a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler

cluster_kwargsdict, optional

additional kwargs passed to the clustering function in geosnap.analyze.regionalize

max_kint, optional

maximum number of clusters to test, by default 10

return_tablebool, optional

if True, return the table of fit metrics for each combination of k and cluster method, by default False

Returns:
pandas.DataFrame

if return_table==False (default), returns a pandas dataframe with a single column that holds the optimal number of clusters according to each fit metric (row index).

if return_table==True, also returns a table of fit coefficients for each k between min_k and max_k