geosnap.analyze.regionalize¶

geosnap.analyze.regionalize(gdf, n_clusters=6, spatial_weights='rook', method=None, columns=None, threshold_variable='count', threshold=10, temporal_index='year', unit_index='geoid', scaler='std', weights_kwargs=None, region_kwargs=None, model_colname=None, return_model=False)[source]¶

Create a spatial geodemographic typology by running a cluster analysis on the metro area’s neighborhood attributes and including a contiguity constraint.

Parameters:

gdfgeopandas.GeoDataFrame: long-form geodataframe holding neighborhood attribute and geometry data.
n_clustersint: the number of clusters to model. The default is 6).
spatial_weights[‘queen’, ‘rook’] or libpysal.weights.W object: spatial weights matrix specification`. By default, geosnap will calculate Rook weights, but you can also pass a libpysal.weights.W object for more control over the specification.
methodstr in [‘ward_spatial’, ‘kmeans_spatial’, ‘spenc’, ‘skater’, ‘azp’, ‘max_p’]: the clustering algorithm used to identify neighborhood types
columnsarray_like: subset of columns on which to apply the clustering
threshold_variablestr: for max-p, which variable should define p. The default is “count”, which will grow regions until the threshold number of polygons have been aggregated
thresholdnumeric: threshold to use for max-p clustering (the default is 10).
temporal_indexstr: which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”
unit_indexstr: which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index
weights_kwargsdict: If passing a libpysal.weights.W instance to spatial_weights, these additional keyword arguments that will be passed to the weights constructor
region_kwargs: dict: additional keyword arguments passed to the regionalization algorithm
scalerNone or scaler class from sklearn.preprocessing: a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler
model_colnamestr: column name for storing cluster labels on the output dataframe. If no name is provided, the colun will be named after the clustering method. If there is already a column named after the clustering method, the name will be incremented with a number
return_model: bool: If True, also retun a dictional of fitted classes from the regionalization provider

Returns:

gdfgeopandas.GeoDataFrame: GeoDataFrame with a column of neighborhood cluster labels appended as a new column. If cluster method exists as a column on the DataFrame then the column will be incremented.
modelsdict of named tuples (only returned if return_model is True): tab-completable dictionary of named tuples keyed on the Community’s time variable (e.g. year). The tuples store model results and have attributes X, columns, labels, instance, W, which store the input matrix, column labels, fitted model instance, and spatial weights matrix