Metrics Calculation and Aggregation

metric_calculation module

class backend.metrics.metric_calculation.Metric_Calculation(shapes: pandas.DataFrame, gtfs_records: pandas.DataFrame, avl_records: pandas.DataFrame, params: ROVE_params)

Bases: object

Calculate stop, stop-aggregated, route, timepoint, and timepoint-aggregated level metrics. If AVL data is provided and records for the same trip_id exist across multiple days, then calculate the trip metrics by averaging across all service dates. In other words, the metric calculation module averages metrics for the same trip, so that metrics tables after calculation only contains unique route_id, trip_id and stop_pair combinations. This is the upstream calculation of metric aggregation, which averages metrics of all trips on each aggregation level.

Parameters:

shapes (pd.DataFrame) – shapes table from Shape Generation
gtfs_records (pd.DataFrame) – GTFS records table
avl_records (pd.DataFrame) – AVL records table
data_option (str) – user-specified data option

Raises:

ValueError – ‘AVL’ is in data_option but the avl_records table is None

gtfs_stop_metrics: pandas.DataFrame: Initial stop-level metrics table generated from the GTFS records table.

gtfs_tpbp_metrics: Initial timepoint-level metrics table generated from the GTFS records table.

gtfs_route_metrics: Initial route-level metrics table generated from the GTFS records table.

stop_spacing(shapes): Stop spacing in ft. Distance is returned from Valhalla trace route requests in unit of kilometers.

scheduled_headway(): Scheduled headway in minutes. Defined as the difference between two consecutive scheduled arrivals of a route at the first stop of a stop pair.

scheduled_running_time(): Running time in minutes. Defined as the difference between the departure time at a stop and arrival time at the next stop.

scheduled_speed_without_dwell(): Scheduled running speed in mph. Defined as stop spacing divided by running time.

observed_headway(): Observed headway in minutes. Defined as the difference between two consecutive observed arrivals of a route at the first stop of a stop pair on each day, then averaged over all service dates.

observed_running_time(): Observed running time without dwell in minutes. Defined as the time between departure at a stop and arrival at the next stop averaged over all service dates for each bus trip.

observed_speed_without_dwell(): Observed running speed without dwell in mph. Defined as stop spacing divided by the observed running time without dwell.

observed_running_time_with_dwell(): Observed running time with dwell in minutes. Defined as the time between arrival at a stop and arrival at the next stop averaged over all service dates for each bus trip.

observed_speed_with_dwell(): Observed running speed with dwell in mph. Defined as stop spacing divided by the observed running time with dwell.

boardings(): Boardings in pax. Defined as the number of passengers boarding the bus at each stop averaged over all service dates for each bus trip.

on_time_performance(no_earlier_than=-1, no_later_than=5, route_metric_bases: str = 'timepoint')

On time performance in seconds of delay (actual arrival - scheduled arrival) for stop segments, and percentage of stops on time per trip for routes, averaged over all service dates for each bus trip.

Parameters:

no_earlier_than (int, optional) – minutes that a bus can arrive early for to be on time. Must be negative, defaults to -1
no_later_than (int, optional) – minutes that a bus can arrive late for to be on time. Must be positive, defaults to 5
route_metric_bases (str) – whether the route on-time performance is calculated by counting number of timepoints on-time or number of stops on-time

Raises:

ValueError – no_earlier_than is positive or no_later_than is negative

passenger_load(): Passenger load in pax. Defined as the number of passengers onboard the bus within each stop pair, averaged over all service dates for each bus trip.

crowding(): Crowding in percentage. Defined as the percent of passenger load over seated capacity for stop metrics, and percent of peak load over seated capacity for route metrics, averaged over all service dates for each bus trip.

congestion_delay(): Vehicle congestion delay in min/mile and passenger congestion delay in pax-min/mile.

metric_aggregation module

class backend.metrics.metric_aggregation.Metric_Aggregation(metrics: Metric_Calculation, params: ROVE_params)

Bases: object

Aggregated stop, stop-aggregated, route, timepoint, and timepoint-aggregated level metrics.

Parameters:

metrics (Metric_Calculation) – calculated GTFS (and AVL) metrics at all levels
params (ROVE_params) – a rove_params object that stores information needed throughout the backend

segments: pandas.DataFrame: Initial stop-level aggregated metrics table generated from gtfs_stop_metrics, contains unique records of route_id + stop_pair.

corridors: pandas.DataFrame: Initial stop-aggregated-level aggregated metrics table generated from gtfs_stop_metrics, contains unique records of stop_pair.

routes: pandas.DataFrame: Initial route-level aggregated metrics table generated from gtfs_route_metrics, contains unique records of route_id + direction_id.

tpbp_segments: pandas.DataFrame: Initial timepoint-level aggregated metrics table generated from gtfs_pbp_metrics, contains unique records of route_id + timepoint stop_pair.

tpbp_corridors: pandas.DataFrame: Initial timepoint-aggregated-level aggregated metrics table generated from gtfs_tpbp_metrics, contains unique records of timepoint stop_pair.

percentiles: Dict[str, int]: A dict of two percentile values used for data aggregation, median (50th percentile) and 90th percentile.

speed_range: A dict of minimum and maximum speeds that bound the calculated speeds.

time_dict: Dict[str, Dict]: A dict of time periods for data aggregation, retrieved from the “time_periods” object in backend_config.

aggregate_metrics(percentile: int)

All metrics aggregation methods. Can be overriden by child class to add more methods.

Parameters:: percentile (int) – percentile of metrics that is returned, e.g. 50 -> median, 90 -> worst decile

aggregate_by_start_end_time(start_time: List, end_time: List, percentile: int)

Given a start_time and end_time, filter each metrics table to keep only stop arrivals within the time window, or trips that depart from the first stop within the time window, then calculate each time-dependent metric using the time-filtered metrics table. A non time-dependent metric is one that does not change with different trips, such as stop spacing. All other metrics are time-dependent, and therefore requries the time-filtered metrics for aggregation.

Parameters:

start_time (List) – the time after which trips/stop events are considered for aggregation, given in a list of [hour, minute], e.g. [3, 0] is 3 am, and [25, 0] is 1 am of the same operation day
end_time (List) – the time before which trips/stop events are considered for aggregation
percentile (int) – percentile of metrics that is returned, e.g. 50 -> median, 90 -> worst decile

Raises:

TypeError – start_time or end_time is not provided in a list
ValueError – end_time is earlier than start_time

aggregate_by_10min_intervals(output_path: str): Generate aggregation output for every 10-min interval of the day and write to a pickled file the results in a dict. Each key is a 10-min interval of the full day (defined in the frontend config file under ‘PeriodRanges’ -> ‘full’), and each element is a dict, whose key is a percentile of aggregation (e.g. 50 or 90), and element is a tuple of five dataframes, each one containing the aggregated metrics of stop, stop-aggregated, route, timepoint, and timepoint-aggregated metrics.

aggregate_by_time_periods(output_path: str): Generate aggregation output by pre-defined time periods and write to a pickled file the results in a dict. Each key is a string concatenation of “time period name” - “aggregation level” - “percentile”, e.g. (am_peak-segment-50), where “segment” means stop level aggregation, corridor means stop-aggregated, segment-timepoints means timepoint, and corridor-timepoints means timepoint-aggregated. Each element is the corresponding aggregated metrics table normalized to the JSON format.

sample_size()

stop_spacing()

Aggregated stop spacing in ft. This metric is not time-dependent, so use non-time-filtered metrics for calculations.

stop/stop-aggregated level: stop_spacing of stop pairs averaged over all trips
routes level: sum of stop_spacing of all stops along a route averaged over all trips
timepoint/timepoint-aggregated level: stop_spacing of timepoint pairs averaged over all trips

service_start_end()

Aggregated service start/end in hour.

stop/stop-aggregated/timepoint/timepoint-aggregated level: the first arrival at first stop of the pair and the last arrival at the first stop of the pair
routes level: the first arrival at first stop (service start) and the last arrival at last stop (service end) of all trips

revenue_hour()

Aggregated revenue hours in hr.

all aggregation levels: the time lapse between service_end and service_start

headway(percentile: int, data_type: str)

Aggregated scheduled or observed headway in minutes.

all aggregation levels: the average or mode or percentile of headways of all trips

Parameters:

percentile (int) – the percentile that metrics are aggregated at
data_type (str) – ‘scheduled’ or ‘observed’

frequency(data_type: str)

Aggregated scheduled frequency in trips/hr.

all aggregation levels: the number of trips divided by service span (revenue hour)

Parameters:: data_type (str) – ‘scheduled’ or ‘observed’

running_time(percentile: int, data_type: str)

Aggregated scheduled or observed running time in minutes.

stop/stop-aggregated level: running time between each stop pair averaged over all trips

routes level: sum of running time between all stop pairs along a route averaged over all trips

timepoint/timepoint-aggregated level: running time between each timepoint pair averaged over all trips

Parameters:

percentile (int) – the percentile that metrics are aggregated at
data_type (str) – ‘scheduled’ or ‘observed’

speed(percentile: int, data_type: str, dwell: str)

Aggregated scheduled or observed running speed with or without dwell in mph.

stop/stop-aggregated level: running speed between each stop pair averaged over all trips

routes level: (sum of stop spacing of all stops) / (sum of running time of all stops) along a route averaged over all trips

timepoint/timepoint-aggregated level: running speed between each timepoint pair averaged over all trips

Parameters:

percentile (int) – the percentile that metrics are aggregated at
data_type (str) – ‘scheduled’ or ‘observed’
dwell (str, optional) – ‘with_dwell’ or ‘’ (empty string means without dwell), defaults to ‘’

wait_time(data_type: str)

Aggregated Poisson wait time in minuntes. Wait time values are capped at 300 min (5 hr).

stop level: headway mean / 2 + variance / (2 * mean), assuming passenger arrival follows a Poisson process. See Equation 2.66 in Larson, R. C. & Odoni, A. R. (1981) Urban operations research. Englewood Cliffs, N.J: Prentice-Hall.

Parameters:: data_type (str) – ‘scheduled’ or ‘observed’

excess_wait_time()

Excess Poisson wait time in minutes.

stop level: observed Poisson wait time - scheduled Poisson wait time

boardings(percentile: int)

Aggregated boardings in pax.

stop/stop-aggregated level: boardings at the first stop of a stop pair averaged over all trips

routes level: and the sum of boardings at all stops along a route averaged over all trips

timepoint/timepoint-aggregated level: boardings at the first stop of a timepoint pair averaged over all trips

Parameters:: percentile (int) – the percentile that metrics are aggregated at

on_time_performance()

Aggregated on-time performance in seconds or %.

stop level: arrival delay in seconds at the first stop of a stop pair averaged over all trips
routes level: percent of on-time arrivals among all stops along a trip averaged over all trips

crowding()

Aggregated crowding in %.

stop/stop-aggregated level: crowding level between a stop pair averaged over all trips
route level: peak crowding level of a trip averaged over all trips

passenger_load(percentile: int)

Aggregated passenger load in pax.

stop level: passenger load between a stop pair averaged over all trips

Parameters:: percentile (int) – the percentile that metrics are aggregated at

passenger_flow()

Aggregated passenger flow in pax/hr.

stop level: (sum of passenger load) / (revenue hour) between a stop pair

congestion_delay()

Aggregated vehicle- and passenger-weighted congestion delay in min/mile or pax-min/mile.

stop level: vehicle- and passenger-congestion delays between a stop pair averaged over all trips

productivity()

Productivity in pax/revenue hour.

route level: (sum of passengers that board at all stops of a route) / (revenue hour of the route)