Network API¶

API¶

class pandarm.network.Network(node_x, node_y, edge_from, edge_to, edge_weights, twoway=True, edge_geom=None, crs=None)[source]¶

Create the transportation network in the city. Typical data would be distance based from OpenStreetMap or travel time from GTFS transit data.

Parameters:

node_xpandas.Series, float: Defines the x attribute for nodes in the network (e.g. longitude)
node_ypandas.Series, float: Defines the y attribute for nodes in the network (e.g. latitude) This param and the one above should have the same index which should be the node_ids that are referred to in the edges below.
edge_frompandas.Series, int: Defines the node ID that begins an edge - should refer to the index of the two series objects above
edge_topandas.Series, int: Defines the node ID that ends an edge - should refer to the index of the two series objects above
edge_weightspandas.DataFrame, all numerics: Specifies one or more impedances on the network which define the distances between nodes. Multiple impedances can be used to capture travel times at different times of day, for instance
twowaybool, optional: Whether the edges in this network are two way edges or one way ( where the one direction is directed from the from node to the to node). If twoway = True, it is assumed that the from and to ID in the edge table occurs once and that travel can occur in both directions on the single edge record. Pandarm will internally flip and append the from and to IDs to the original edges to create a two direction network. If twoway = False, it is assumed that travel can only occur in the explicit direction indicated by the from and to ID in the edge table.
edge_geombool | gpd.GeometryArray | gpd.GeoSeries, default is None: Array-like (typically a GeoSeries) of geometries representing the geometric shape of each
crsstr | pyproj.CRS, default is None: coordinate system in which node x and y coordinates (and edge_geom, if provided) are stored. If None, it is assumed the coordinates are geographic (i.e. WGS84, EPSG:4326)

aggregate(distance, func='sum', decay='linear', imp_name=None, name='tmp', type=None)[source]¶

Aggregate information for every source node in the network - this is really the main purpose of this library. This allows you to touch the data specified by calling set and perform some aggregation on it within the specified distance. For instance, summing the population within 1000 meters.

Parameters:

distancefloat

The maximum distance to aggregate data within. ‘distance’ can represent any impedance unit that you have set as your edge weight. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc.

funcstr, optional (default ‘sum’)

The type of aggregation: ‘mean’ (with ‘ave’, ‘avg’, ‘average’ as aliases), ‘std’ (or ‘stddev’), ‘sum’, ‘count’, ‘min’, ‘max’, ‘med’ (or ‘median’), ‘25pct’, or ‘75pct’. (Quantiles are computed by sorting so may be slower than the others.)

decaystr, optional (default ‘linear’)

The type of decay to apply, which makes things that are further away count less in the aggregation: ‘linear’, ‘exponential’, or ‘flat’ (no decay).

Additional notes: see aggregateAccessibilityVariable in accessibility.cpp to read through the code that applies decays. The exponential decay function is exp(-1*distance/radius)*var. The decay setting only operates on ‘sum’ and ‘mean’ aggregations. If you apply decay to a ‘mean’, the result will NOT be a weighted average; it will be the mean of the post-decay values. (So for a ‘mean’ aggregation, you need to explicitly set decay to ‘flat’ unless you want that.)

imp_namestr, optional

The impedance name to use for the aggregation on this network. Must be one of the impedance names passed in the constructor of this object. If not specified, there must be only one impedance passed in the constructor, which will be used.

namestr, optional

The variable to aggregate. This variable will have been created and named by a call to set. If not specified, the default variable name will be used so that the most recent call to set without giving a name will be the variable used.

Returns:

aggpandas.Series: Returns a Pandas Series for every origin node in the network, with the index which is the same as the node_ids passed to the init method and the values are the aggregations for each source node in the network.

property bbox¶: The bounding box for nodes in this network [xmin, ymin, xmax, ymax]

classmethod from_gdf(gdf, network_type='walk', twoway=False, add_travel_times=False, default_speeds=None)[source]¶

Create a pandarm.Network object from a geodataframe (via OSMnx graph).

Parameters:

gdfgeopandas.GeoDataFrame: dataframe covering the study area of interest; Note the first step is to take the unary union of this dataframe, which is expensive, so large dataframes may be time-consuming. The network will inherit the CRS from this dataframe
network_typestr, {“all_private”, “all”, “bike”, “drive”, “drive_service”, “walk”}: the type of network to collect from OSM (passed to osmnx.graph_from_polygon) by default “walk”
twowaybool, optional: Whether to treat the pandarm.Network as directed or undirected. For a directed network, use twoway=False (which is the default). For an undirected network (e.g. a walk network) where travel can flow in both directions, the network is more efficient when twoway=True but forces the impedance to be equal in both directions. This has implications for auto or multimodal networks where impedance is generally different depending on travel direction.
add_travel_timesbool, default=False: whether to use posted travel times from OSM as the impedance measure (rather than network-distance). Speeds are based on max posted drive speeds, see <https://osmnx.readthedocs.io/en/stable/internals-reference.html#osmnx-speed-module> for more information.
default_speedsdict, optional: default speeds passed assumed when no data available on the OSM edge. Defaults to {“residential”: 35, “secondary”: 50, “tertiary”: 60}. Only considered if add_travel_times is True

Returns:

pandarm.Network: a pandarm.Network object with node coordinates stored in the same system as the input geodataframe. If add_travel_times is True, the network impedance is travel time measured in seconds (assuming automobile travel speeds); else the impedance is travel distance measured in meters

Raises:

ImportError: requires osmnx, raises if module not available

classmethod from_hdf5(filename)[source]¶

Load a previously saved Network from a Pandas HDF5 file.

Parameters:

filenamestr

Returns:

networkpandarm.Network

get_node_ids(x_col, y_col, mapping_distance=None)[source]¶

Assign ID of the nearest node to data specified by x_col and y_col.

Returns a Pandas Series of node_ids for each x, y in the input data, representing the nearest node in the Euclidean space. The index is the same as the indexes of the x, y input data, and the values are the mapped node_ids. If mapping distance is not passed and if there are no NaNs in the x, y data, this will be the same length as the x, y data. If the mapping is imperfect, this function returns all the input x, y’s that were successfully mapped to node_ids.

Parameters:

x_colpandas.Series (float): A Pandas Series where values specify the x (e.g. longitude) location of dataset.
y_colpandas.Series (float): A Pandas Series where values specify the y (e.g. latitude) location of dataset. x_col and y_col should use the same index.
mapping_distancefloat, optional: The maximum distance in Euclidean space that will be considered a match between the x, y data and the nearest node in the network. This will be a distance unit in the units of x, y and node coordinates (usually meters). If not specified, every x, y coordinate will be mapped to the nearest node.

Returns:

node_idspandas.Series (int)

low_connectivity_nodes(impedance, count, imp_name=None)[source]¶

Identify nodes that are connected to fewer than some threshold of other nodes within a given distance.

Parameters:

impedancefloat: Distance within which to search for other connected nodes. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc.
countint: Threshold for connectivity. If a node is connected to fewer than this many nodes within impedance it will be identified as “low connectivity”.
imp_namestr, optional: The impedance name to use for the aggregation on this network. Must be one of the impedance names passed in the constructor of this object. If not specified, there must be only one impedance passed in the constructor, which will be used.

Returns:

node_idsarray: List of “low connectivity” node IDs.

nearest_pois(distance, category, num_pois=1, max_distance=None, imp_name=None, include_poi_ids=False)[source]¶

Find the distance to the nearest points of interest (POI)s from each source node. The bigger values in this case mean less accessibility.

Parameters:

distancefloat: The maximum distance to look for POIs. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc.
categorystr: The name of the category of POI to look for
num_poisint: The number of POIs to look for, this also sets the number of columns in the DataFrame that gets returned
max_distancefloat, optional: The value to set the distance to if there is no POI within the specified distance - if not specified, gets set to distance. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc.
imp_namestr, optional: The impedance name to use for the aggregation on this network. Must be one of the impedance names passed in the constructor of this object. If not specified, there must be only one impedance passed in the constructor, which will be used.
include_poi_idsbool, optional: If this flag is set to true, the call will add columns to the return DataFrame - instead of just returning the distance for the nth POI, it will also return the id of that POI. The names of the columns with the POI IDs will be poi1, poi2, etc - it will take roughly twice as long to include these IDs as to not include them

Returns:

dpandas.DataFrame: Like aggregate, this series has an index of all the node ids for the network. Unlike aggregate, this method returns a dataframe with the number of columns equal to the distances to the Nth closest POI. For instance, if you ask for the 10 closest poi to each node, column d[1] wil be the distance to the 1st closest POI of that category while column d[2] will be the distance to the 2nd closest POI, and so on.

property node_ids¶: The node IDs which will be used as the index of many return series

nodes_in_range(nodes, radius, imp_name=None)[source]¶

Computes the range queries (the reachable nodes within this maximum distance) for each input node.

Parameters:

nodeslist-like of ints: Source node IDs
radiusfloat: Maximum distance to use. This will usually be a distance unit in meters however if you have customized the impedance (using the imp_name option) this could be in other units such as utility or time etc.
imp_namestr, optional: The impedance name to use for the aggregation on this network. Must be one of the impedance names passed in the constructor of this object. If not specified, there must be only one impedance passed in the constructor, which will be used.

Returns:

dpandas.DataFrame: Like nearest_pois, this is a dataframe containing the input node index, the index of the nearby nodes within the search radius, and the distance (according to the requested impedance) from the source to the nearby node.

plot(data, bbox=None, plot_type='scatter', fig_kwargs=None, plot_kwargs=None, cbar_kwargs=None)[source]¶

Plot an array of data on a map using Matplotlib, automatically matching the data to the pandarm network node positions. Keyword arguments are passed to the plotting routine.

Parameters:

datapandas.Series: Numeric data with the same length and index as the nodes in the network.
bboxtuple, optional: (lat_min, lng_min, lat_max, lng_max)
plot_type{‘hexbin’, ‘scatter’}, optional
fig_kwargsdict, optional: Keyword arguments that will be passed to matplotlib.pyplot.subplots. Use this to specify things like figure size or background color.
plot_kwargsdict, optional: Keyword arguments that will be passed to the matplotlib plotting command. Use this to control plot styles and color maps.
cbar_kwargsdict, optional: Keyword arguments that will be passed to matplotlib.pyplot.colorbar. Use this to control color bar location and label.

Returns:

figmatplotlib.Figure
axmatplotlib.Axes

precompute(distance)[source]¶

Precomputes the range queries (the reachable nodes within this maximum distance. So as long as you use a smaller distance, cached results will be used.)

Parameters:

distancefloat: The maximum distance to use. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc.

Returns:

Nothing

save_hdf5(filename, rm_nodes=None, complevel=None, complib=None)[source]¶

Save network data to a Pandas HDF5 file.

Only the nodes and edges of the actual network are saved, points-of-interest and data attached to nodes are not saved.

Parameters:

filenamestr
rm_nodesarray_like: A list, array, Index, or Series of node IDs that should not be saved as part of the Network.

set(node_ids, variable=None, name='tmp')[source]¶

Characterize urban space with a variable that is related to nodes in the network.

Parameters:

node_idspandas.Series, int: A series of node_ids which are usually computed using get_node_ids on this object.
variablepandas.Series, numeric, optional: A series which represents some variable defined in urban space. It could be the location of buildings, or the income of all households - just about anything can be aggregated using the network queries provided here and this provides the api to set the variable at its disaggregate locations. Note that node_id and variable should have the same index (although the index is not actually used). If variable is not set, then it is assumed that the variable is all “ones” at the location specified by node_ids. This could be, for instance, the location of all coffee shops which don’t really have a variable to aggregate. The variable is connected to the closest node in the pandarm network which assumes no impedance between the location of the variable and the location of the closest network node.
namestr, optional: Name the variable. This is optional in the sense that if you don’t specify it, the default name will be used. Since the same default name is used by aggregate on this object, you can alternate between characterize and aggregate calls without setting names.

Returns:

Nothing

set_pois(category, maxdist, maxitems, x_col=None, y_col=None, mapping_distance=None)[source]¶

Set the location of all the points of interest (POIs) of this category.: The POIs are connected to the closest node in the pandarm network which assumes no impedance between the location of the variable and the location of the closest network node.

Parameters:

categorystr: The name of the category for this set of POIs
maxdistfloat: The maximum distance that will later be used in find_all_nearest_pois()
maxitemsint: The maximum number of items that will later be requested in find_all_nearest_pois()
x_colpandas.Series (float): The x location (longitude) of POIs in this category
y_colpandas.Series (float): The y location (latitude) of POIs in this category
mapping_distancefloat, optional: The maximum distance that will be considered a match between the POIs and the nearest node in the network. This will usually be a distance unit in meters however if you have customized the impedance this could be in other units such as utility or time etc. If not specified, every POI will be mapped to the nearest node.

Returns:

Nothing

shortest_path(node_a, node_b, imp_name=None)[source]¶

Return the shortest path between two node IDs in the network. Must provide an impedance name if more than one is available.

Parameters:

node_aint: Source node ID
node_bint: Destination node ID
imp_namestr, optional: The impedance name to use for the shortest path

Returns:

pathnp.ndarray: Nodes that are traversed in the shortest path

shortest_path_geometry(node_a, node_b, imp_name=None)[source]¶

Return the path from node_a to node_b as a geodataframe

Parameters:

node_aint, str: index of origin node in the Network
node_bint, str: index of destination node in the Network
imp_namestr, optional: name of the Network impedance to use, by default None

Returns:

gpd.GeoDataFrame: dataframe of line geometries representing the shortest path from node_a to node_b

Raises:

ValueError: fails if geometry column not present in the edges_df table

shortest_path_geoms(nodes_a, nodes_b, imp_name=None)[source]¶

Return geometric representation of the shortest path between pairs of nodes as a geodataframe

Parameters:

nodes_alist-like of origin node_ids: array of origin node_ids
nodes_blist_like of destination node_ids: array of destination node_ids
imp_namestr, optional: name of impedance column in the Network, by default None

Returns:

gpd.GeoDataFrame: dataframe of line geometries representing each pair of node_a and node_b

shortest_path_length(node_a, node_b, imp_name=None)[source]¶

Return the length of the shortest path between two node IDs in the network. Must provide an impedance name if more than one is available.

If you have a large number of paths to calculate, don’t use this function! Use the vectorized one instead.

Parameters:

node_aint: Source node ID
node_bint: Destination node ID
imp_namestr: The impedance name to use for the shortest path

Returns:

lengthfloat

shortest_path_lengths(nodes_a, nodes_b, imp_name=None)[source]¶

Vectorized calculation of shortest path lengths. Accepts a list of origins and list of destinations and returns a corresponding list of shortest path lengths. Must provide an impedance name if more than one is available.

Parameters:

nodes_alist-like of ints: Source node IDs
nodes_blist-like of ints: Corresponding destination node IDs
imp_namestr: The impedance name to use for the shortest path

Returns:

lengthslist of floats

shortest_paths(nodes_a, nodes_b, imp_name=None)[source]¶

Vectorized calculation of shortest paths. Accepts a list of origins and list of destinations and returns a corresponding list of shortest path routes. Must provide an impedance name if more than one is available.

Parameters:

nodes_alist-like of ints: Source node IDs
nodes_blist-like of ints: Corresponding destination node IDs
imp_namestr: The impedance name to use for the shortest path

Returns:

pathslist of np.ndarray: Nodes traversed in each shortest path

to_crs(output_crs, input_crs=None)[source]¶

Reproject a pandarm.Network object into another coordinate system.

Note this function does affect the weight/impedance of any network edges, but reprojects the x and y coordinates of the nodes (e.g. for precise snapping) between nodes and projected origin/destination data

Parameters:

networkpandarm.Network: an instantiated pandarm Network object
input_crsint, optional: the coordinate system used in the Network.node_df dataframe. Typically these data are collected in Lon/Lat, so the default 4326. If None, but there is a geometry column present in the Network, input CRS is inferred
output_crsint, str, or pyproj.crs.CRS, required: EPSG code or pyproj.crs.CRS object of the output coordinate system

Returns:

pandarm.Network: an initialized pandarm.Network with ‘x’ and y’ values represented by coordinates in the specified CRS