4  Geodemographic Clustering

import contextily as ctx
import geopandas as gpd
import matplotlib.pyplot as plt
from geosnap import DataStore
from geosnap import analyze as gaz
from geosnap import io as gio
from geosnap import visualize as gvz
%load_ext watermark
%watermark -a  'eli knaap' -iv -d -u
Author: eli knaap

Last updated: 2024-01-21

contextily: 1.4.0
geopandas : 0.14.2
matplotlib: 3.8.2
geosnap   : 0.12.1.dev9+g3a1cb0f6de61.d20240110

Geodemographic analysis, which includes the application of unsupervised learning algorithms to demographic and socioeconomic data, is a widely-used technique that falls under the broad umbrella of “spatial data science”. Technically there is no formal spatial analysis in traditional geodemographics, however given its emphasis on geographic units of analysis (and subsequent mapping of the results) it is often viewed as a first (if not requisite step) in exploratory analyses of a particular study area.

The intellectual roots of geodemographics extend from analytical sociology and classic studies from Factorial Ecology and Social Area Analysis. Today, demogemographic analysis is routinely applied in academic studies of neighborhood segregation and neighborhood change, and used extremely frequently in industry, particularly marketing where products like tapestry and mosaic are sold for their predictive power. Whereas social scientists often look at the resulting map of neighborhood types and ask how these patterns came to be, practitioners often look at the map and ask how they can use the patterns to inform better strategic decisions.

In urban social science, our goal is often to undertand the social composition of neighborhoods in a region, understand whether they have changed over time (and where) and whether these neighborhood types are consistent over time and across places. That requires a common pipeline of collecting the same variable sets, standardizing them (often within the same time period so they can be pooled with other time periods) then clustering the entire long-form dataset followed by further analysis and visualization of the results. Most often, this process happens repeatedly using diffferent combinations of variables or different algorithms or cluster outputs (and in different places at different times). Geosnap provides a set of tools to simplify this pipeline

datasets = DataStore()
atl = gio.get_acs(datasets, msa_fips="12060", years=2021, level="tract")
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:215: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
atl[["median_household_income", "geometry"]].explore(
    tiles="CartoDB Positron",
Make this Notebook Trusted to load map: File -> Trust Notebook