API reference

IO Module

Accessing Datasets

The DataStore class provides access to a fast and efficient database of neighborhood indicators for the United States. The DataStore can read information directly over the web, or it can cache the datasets locally for (shared) repeated use. Large datasets are available quickly with no configuration by accessing methods on the class.

DataStore([data_dir, disclaimer])

Storage for geosnap data.

DataStore.acs([year, level, states])

American Community Survey Data (5-year estimates).

DataStore.bea_regions()

Table that maps states to their respective BEA regions

DataStore.blocks_2000([states, fips])

Census blocks for 2000.

DataStore.blocks_2010([states, fips])

Census blocks for 2010.

DataStore.blocks_2020([states, fips])

Census blocks for 2020.

DataStore.codebook()

Codebook.

DataStore.counties()

Nationwide counties as drawn in 2010.

DataStore.ejscreen([year, states])

EPA EJScreen Data <https://www.epa.gov/ejscreen>.

DataStore.lodes_codebook()

_summary_

DataStore.ltdb()

Longitudinal Tract Database (LTDB).

DataStore.msa_definitions()

2010 Metropolitan Statistical Area definitions.

DataStore.msas()

Metropolitan Statistical Areas as drawn in 2020.

DataStore.ncdb()

Geolytics Neighborhood Change Database (NCDB).

DataStore.nces([year, dataset])

National Center for Education Statistics (NCES) Data.

DataStore.show_data_dir([verbose])

Print the location of the local geosnap data storage directory.

DataStore.states()

States.

DataStore.tracts_1990([states])

Nationwide Census Tracts as drawn in 1990 (cartographic 500k).

DataStore.tracts_2000([states])

Nationwide Census Tracts as drawn in 2000 (cartographic 500k).

DataStore.tracts_2010([states])

Nationwide Census Tracts as drawn in 2010 (cartographic 500k).

DataStore.tracts_2020([states])

Nationwide Census Tracts as drawn in 2020 (cartographic 500k).

Storing data

To store the datasets locally for repeated use, or to register an external dataset with geosnap, such as the Longitudinal Tract Database (LTDB) or the Neighborhood Change Database (NCDB), the io module includes functions for caching data on your local machine. When you instantiate a DataStore class, it will use local files instead of streaming over the web.

io.store_acs([years, level, data_dir])

Save census American Community Survey 5-year data to the local geosnap storage.

io.store_census([data_dir, verbose])

Save census data to the local quilt package storage.

io.store_blocks_2000([data_dir])

Save census 2000 census block data to the local quilt package storage.

io.store_blocks_2010([data_dir])

Save census 2010 census block data to the local quilt package storage.

io.store_blocks_2020([data_dir])

Save census 2020 census block data to the local quilt package storage.

io.store_ejscreen([years, data_dir])

Save EPA EJScreen data to the local geosnap storage.

io.store_ltdb(sample, fullcount[, data_dir])

Read & store data from Brown's Longitudinal Tract Database (LTDB).

io.store_ncdb(filepath[, data_dir])

Read & store data from Geolytics's Neighborhood Change Database.

io.store_nces([years, dataset, data_dir])

Save NCES data to the local geosnap storage.

Querying datasets

io.get_acs(datastore[, level, state_fips, ...])

Extract a subset of data from the American Community Survey (ACS).

io.get_census(datastore[, state_fips, ...])

Extract a subset of data from the decennial U.S.

io.get_ejscreen(datastore[, state_fips, ...])

Extract a subset of data from the EPA EJSCREEN as a long-form geodataframe.

io.get_gadm(code[, level, use_fsspec, gpkg, ...])

Collect data from GADM as a geodataframe.

io.get_lodes(datastore[, state_fips, ...])

Extract a subset of data from Census LEHD/LODES .

io.get_ltdb(datastore[, state_fips, ...])

Extract a subset of data from the Longitudinal Tract Database (LTDB) as a long-form geodataframe.

io.get_nces(datastore[, years, dataset])

Extract a subset of data from the National Center for Educational Statistics as a long-form geodataframe.

io.get_ncdb(datastore[, state_fips, ...])

Extract a subset of data from the Neighborhood Change Database (NCDB).

io.get_network_from_gdf(gdf[, network_type, ...])

Create a pandana.Network object from a geodataframe (via OSMnx graph).

io.project_network(network[, output_crs, ...])

Reproject a pandana.Network object into another coordinate system.

Analyze Module

Neighborhood Clustering Methods

Model neighborhood differentiation using multivariate clustering algorithms

analyze.cluster(gdf[, n_clusters, method, ...])

Create a geodemographic typology by running a cluster analysis on the study area's neighborhood attributes.

analyze.find_k(gdf[, method, columns, ...])

Brute-forse search through cluster fit metrics to determine the optimal number of k clusters

analyze.find_region_k(gdf[, method, ...])

Brute force through cluster fit metrics to determine the optimal number of k regions

analyze.regionalize(gdf[, n_clusters, ...])

Create a spatial geodemographic typology by running a cluster analysis on the metro area's neighborhood attributes and including a contiguity constraint.

Neighborhood Dynamics Methods

Model neighborhood change using optimal-matching algorithms or spatial discrete Markov chains

analyze.draw_sequence_from_gdf(gdf, w, ...)

Draw a set of class labels for each unit in a geodataframe using transition probabilities defined by a giddy.Spatial_Markov model and the spatial lag of each unit.

analyze.linc(labels_sequence)

Local Indicator of Neighborhood Change

analyze.lincs_from_gdf(gdf, unit_index, ...)

generate local indicators of neighborhood change from a long-form geodataframe

analyze.sequence(gdf, cluster_col[, ...])

Pairwise sequence analysis and sequence clustering.

analyze.transition(gdf, cluster_col[, ...])

Model neighborhood change as a discrete spatial Markov process.

Segregation Dynamics Methods

Rapidly compute and compare changes in segregation measures over time and across space

analyze.segdyn.singlegroup_tempdyn(gdf[, ...])

Batch compute singlegroup segregation indices for each time period in parallel.

analyze.segdyn.multigroup_tempdyn(gdf[, ...])

Batch compute multigroup segregation indices for each time period.

analyze.segdyn.spacetime_dyn(gdf[, ...])

Batch compute multiscalar segregation profiles for each time period in parallel.

Network Analysis Methods

Compute shortest path distance along a network using pandana, and visualize travel time isochrones from local data

analyze.pdna_to_adj(origins, network, threshold)

Create an adjacency list of shortest network-based travel between

analyze.isochrones_from_gdf(origins, ...[, ...])

Create travel isochrones for several origins simultaneously

The ModelResults Class

Many of geosnap’s analytics methods can return a ModelResults class that stores additional statistics, diagnostics, and plotting methods for inspection

ModelResults.boundary_silhouette

Calculate boundary silhouette scores for each unit.

ModelResults.lincs

Calculate Local Indicators of Neighborhood Change (LINC) scores for each unit.

ModelResults.path_silhouette

Calculate path silhouette scores for each unit.

ModelResults.silhouette_scores

Calculate silhouette scores for the each unit.

ModelResults.plot_boundary_silhouette([...])

Plot the boundary silhouette scores for each unit as a choropleth map.

ModelResults.plot_next_best_label([...])

Plot the next-best cluster label for each unit as a choropleth map.

ModelResults.plot_silhouette([metric, title])

Create a diagnostic plot of silhouette scores using scikit-plot.

ModelResults.plot_silhouette_map([...])

Plot the silhouette scores for each unit as a [series of] choropleth map(s).

ModelResults.plot_path_silhouette([...])

Plot the path silhouette scores for each unit as a choropleth map.

ModelResults.predict_markov_labels([w_type, ...])

Predict neighborhood labels from the model in future time periods using a spatial Markov transition model

Harmonize Module

harmonize.harmonize(gdf[, target_year, ...])

Use spatial interpolation to standardize neighborhood boundaries over time.

Visualize Module

visualize.animate_timeseries(gdf[, column, ...])

Create an animated gif from a long-form geodataframe timeseries.

visualize.gif_from_path([path, figsize, ...])

Create an animated gif from a director of image files.

visualize.indexplot_seq(df_traj, clustering)

Function for index plot of neighborhood sequences within each cluster.

visualize.plot_timeseries(gdf, column[, ...])

Plot an attribute from a geodataframe arranged as a timeseries with consistent colorscaling.

visualize.plot_transition_matrix([gdf, ...])

Plot global and spatially-conditioned transition matrices as heatmaps.

visualize.plot_transition_graphs(gdf[, ...])

Plot a network graph representation of global and spatially-conditioned transition matrices.

visualize.plot_violins_by_cluster(df, ...[, ...])

Create matrix of violin plots categorized by a discrete class variable

Util Module