geosnap.analyze.find_region_k¶

geosnap.analyze.find_region_k(gdf, method=None, columns=None, spatial_weights='rook', temporal_index='year', unit_index='geoid', scaler='std', weights_kwargs=None, region_kwargs=None, min_k=2, max_k=10, return_table=False)[source]¶

Brute force through cluster fit metrics to determine the optimal number of k regions

Parameters:

gdfgeopandas.GeoDataFrame: a long-form geodataframe
methodstr, optional: the clustering method to use, by default None
columnslist, optional: a list of columns in gdf to use in the clustering algorithm, by default None
spatial_weights[‘queen’, ‘rook’] or libpysal.weights.W object: spatial weights matrix specification`. By default, geosnap will calculate Rook weights, but you can also pass a libpysal.weights.W object for more control over the specification.
temporal_indexstr, optional: column that uniquely identifies time periods, by default “year”
unit_indexstr, optional: column that uniquely identifies geographic units, by default “geoid”
scalerNone or scaler from sklearn.preprocessing, optional: a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler
cluster_kwargsdict, optional: additional kwargs passed to the clustering function in geosnap.analyze.regionalize
max_kint, optional: maximum number of clusters to test, by default 10
return_tablebool, optional: if True, return the table of fit metrics for each combination of k and cluster method, by default False

Returns:

pandas.DataFrame

if return_table==False (default), returns a pandas dataframe with a single column that holds the optimal number of clusters according to each fit metric (row index).

if return_table==True, also returns a table of fit coefficients for each k between min_k and max_k