Backend
Introduction
The main use of the ROVE backend is to generate shapes and metrics files from raw transit data. This documentation provides detailed explanations of every module in the backend, its input, functions and output. The purpose of this documentation is to familiarize the reader with the design and workflow of ROVE, so that one can adapt the tool for their own use.
Instructions
As the package has not been published yet, it CANNOT be installed as a standalone package. The suggested method for now is to download the code base and run the scipts in a conda environment.
All backend processes start in backend_main.py. First, the user needs to specify a few parameters as shown below. AGENCY
is the name of the agency that the user is analyzing for. This is also the name of the directories that the input data should be stored in, and where output data will
be saved to. MONTH and YEAR are 2- and 4-character strings of the month and year to be analyzed, e.g. to analyze metrics for Feb 2021, “02” and “2021” should
be used. DATE_TYPE is the type of day that the user wants to analyze, namely Weekday (which excludes weekends and holidays), Saturday or Sunday. DATA_OPTION is
the concatenated string of the input data that will be used to generate metrics. Currently, only GTFS and AVL data are supported, so the option is either ‘GTFS’ or ‘GTFS-AVL’.
Then, the user will specify which backend module to use, either SHAPE_GENERATION or METRIC_CAL_AGG or both. The ability to choose which module to use is useful
when testing out the backend.
AGENCY = "WMATA" # CTA, MBTA, WMATA, etc...
MONTH = "02" # MM in string format
YEAR = "2021" # YYYY in string format
DATE_TYPE = "Workday" # Workday, Saturday, Sunday
DATA_OPTION = 'GTFS-AVL' # GTFS, GTFS-AVL
SHAPE_GENERATION = True # True/False: whether to generate shapes
METRIC_CAL_AGG = False # True/False: whether to run metric calculation and aggregation
Workflow
The following descriptions aim at providing the reader with details of the workflow of the backend.
Parameter Storing
First, the parameters specified above are passed to and stored in a ROVE_params object. These parameters,
along with others generated within the class object (e.g. list of analysis dates, paths to input and output files, config parameters, etc.), are used
throughout the backend. Users can create a child class by inheriting ROVE_params and use customized attributes or class methods, such as customized
input_paths for where the input files are stored (be careful with changing the output_paths attribute, since that might impact file loading on
the frontend), or a customized generate_date_list() method that defines how the date list is selected.
Loading and Validation of Input Data
Then, depending on the DATA_OPTION, the backend processes the GTFS and optinally AVL data using the GTFS and
AVL objects. Each data class contains methods that are responsible for loading the raw data from a file path, as well as validating the loaded raw data
to make sure the data table(s) and columns meet the specifications described in Input Data Requirements.
In the GTFS object, two of the most important attributes are GTFS.records
(a joined GTFS stop_times and trips table with some extra columns) that is used for the calculation and aggregation of scheduled metrics, and
GTFS.patterns_dict that is used for shape generation. Similarly, the most important attribute in AVL is
AVL.records that is used for the calculation and aggregation of observed metrics.
Shape Generation
Next, the backend enters the Shape Generation module using the class BaseShape. A GTFS.patterns_dict and an output path to the
shapes JSON file are used to initialize a BaseShape object, which contains an attribute shapes that is a data table containing all stop-pair
shapes information. Note that the attribtue shapes stores exactly the same information as the output shapes JSON file, but in a DataFrame format.
Metric Calculation and Aggregation
The shapes, GTFS.records, AVL.records and data_option from above are used to generate calculated metrics stored in a
Metric_Calculation object. Specifically, shapes is used to provide information on stop spacing. GTFS.records and AVL.records are
used to generate scheduled and observed metrics, respectively. data_option is used to decide which metrics to calculated, depending on the data option chosen. In
this module, metrics from each trip are calculated on the stop, timepoint and route levels, and are averaged over all service dates for the same trip if the ‘GTFS-AVL’ option
is selected and multiple days’ AVL data is provided.
These metrics are then processed in the Metric Aggregation module, where metrics of different trips for the same stop pair, timepoint pair, or route are averaged. Metrics are
aggregated on stop, stop-aggregated, timepoint, timepoint-aggregated and route levels (different level have a different set of metrics, see Metric_Aggregation for details.)
Input Data Requirements
The current implementation of ROVE supports two data sources, GTFS (GTFS static data) and AVL data.
Warning
Note that for ROVE to process either data source, they must follow the requirements outlined below. Data that does not comply with the requirements will likely not work in ROVE, and may result in errors or wrong metric calculations.
GTFS
The GTFS data must be a zipped file (.zip) containing GTFS tables in separate text files (.txt). The GTFS zipped file must locate in the backend\data\<agency>\gtfs\
folder, and named GTFS_<AGENCY>_<MONTH>_<YEAR>.zip, e.g. GTFS_MBTA_02_2021.zip. GTFS data should follow the Reference for static GTFS data. As documented in GTFS,
by default, ROVE requires that the zipped GTFS data file contains the following data tables and columns.
Table |
Columns |
|---|---|
stops.txt |
stop_id |
stop_code |
|
stop_name |
|
stop_lat |
|
stop_lon |
|
routes.txt |
route_id |
route_type |
|
trips.txt |
route_id |
service_id |
|
trip_id |
|
direction_id |
|
direction_id |
|
stop_time.txt |
trip_id |
arrival_time |
|
departure_time |
|
stop_id |
|
stop_sequence |
AVL
If the user wishes to calcualte observed metrics, then an AVL data table must be supplied, and the DATA_OPTION must be specified as GTFS-AVL.
The AVL data must be a comma-separated values file (.csv) containing a combinaiton of stop-level Automatic Passenger Counter (APC) and Automatic Vehicle Location (AVL)
records. The AVL file must locate in the backend\data\<agency>\avl\
folder, and named AVL_<AGENCY>_<MONTH>_<YEAR>.csv, e.g. AVL_MBTA_02_2021.csv. Since different transit agency uses different systems and devices to record
AVL data, ROVE requires that the input AVL data must follow a standard format that contains specific columns detailed as follows.
Column |
Definition |
|---|---|
route |
route ID, must be consistent with GTFS route_id |
stop_id |
stop ID, must be consistent with GTFS stop_id (preferred) or stop_code |
stop_time |
date and time of of the stop event |
stop_sequence |
sequence of the stop in a trip |
dwell_time |
dwell time at the stop in integer seconds |
passenger_load |
number of passengers on the bus after leaving the stop |
passenger_on |
number of passengers that boarded the bus at the stop |
passenger_off |
number of passengers that alighted the bus at the stop |
seat_capacity |
number of seats on the bus |
trip_id |
trip ID, must be consistent with GTFS trip_id |
Backend Configuration File
The backend config data must be a JSON file (.json) containing agency-specific parameters listed below. The backend config file must locate in the backend\data\<agency>\
folder, and named config.json (not to be confused with the frontend config file which is named the same but stored in the frontend directory).
An example of the backend config JSON file is given below (the format of the sample snippet is condensed to save space).
{
"time_periods": {
"full": [
[3, 0],
[27, 0]
],
"am_peak": [
[5, 0],
[9, 0]
],
"midday": [
[9, 0],
[15, 0]
],
"pm_peak": [
[15, 0],
[19, 0]
]
},
"speed_range": {
"min": 0,
"max": 65
},
"workalendarPath": "workalendar.usa.massachusetts.Massachusetts",
"route_type": {
"bus": [
"3"
]
}
}
Output Data Formats
Shapes JSON File
The Shape Generation module outputs a JSON file that contains geometry information of each stop pair of each route pattern found in
the GTFS data. The JSON file is saved in the frontend/static/inputs/<agency>/shapes/ directory, and named
bus-shapes_<AGENCY>_<MONTH>_<YEAR>.json. A sample snippet of the shapes JSON file is shown here.
{
{
"geometry": "onrvoAfurqfCvB_e@yQkC}KiAiKaBkH{A{I}CmI{J??SU",
"route_id": "1",
"direction": 0,
"seg_index": "1-64-1",
"stop_pair": [
"64",
"1"
],
"distance": 0.14,
"pattern": "1-0-1",
"mode": "bus",
"timepoint_index": "1-0-1-1"
},
{
"geometry": "cvtvoAbqpqfCyBkCwRoTyVkZwJqL{S}QsDsD??_@]",
"route_id": "1",
"direction": 0,
"seg_index": "1-1-2",
"stop_pair": [
"1",
"2"
],
"distance": 0.173,
"pattern": "1-0-1",
"mode": "bus",
"timepoint_index": "1-0-1-1"
},
}
A quick reference of some name fields in the shapes JSON file is given in the table below. See BaseShape
for detailed definition of all name fields. Users can use Vahalla’s online tool
to verify the geometry generated by Valhalla and stored by ROVE.
Name |
Definition |
|---|---|
pattern |
string concatenation of “<route_id>-<direction_id>-<pattern_count (ordered number of unique patterns of a route and direction)>” |
segment_index |
concatenation of “<route_id>-<stop ID of the first stop in the stop pair>-<stop ID of the second stop in the stop pair>” |
geometry |
encoded polyline of the stop pair (six digits, as specified by Valhalla) |
Timepoints Lookup JSON File
A timepoint lookup file is saved from GTFS, after the static GTFS data is validated.
The JSON file is saved in the frontend/static/inputs/<agency>/timepoints/ directory, and named
timepoints_<AGENCY>_<MONTH>_<YEAR>.json. A sample snippet of the shapes JSON file is shown here. This lookup table is used
by the frontend to visualize timepoint-level metrics using stop-pair geometries.
{
"1-62-63": [
62,
64
],
"1-63-64": [
62,
64
],
}
For each name-value pair stored in the JSON file, the name is the segment_index of the stop pair, and the value is a list of two values, namely the first and last stop of the timepoint pair that this stop pair belongs to.
Stop Name JSON File
A stop name lookup file is saved from GTFS.
The JSON file is saved in the frontend/static/inputs/<agency>/lookup/ directory, and named
lookup_<AGENCY>_<MONTH>_<YEAR>.json. A sample snippet of the shapes JSON file is shown here.
{
"62": {
"stop_name": "Washington St @ Williams St",
"municipality": "Boston"
},
"63": {
"stop_name": "Washington St @ Ruggles St",
"municipality": "Boston"
}
}
Aggregated Metric Files
The aggregated metrics are saved in the data/<agency>/metrics/ directory. Two separate pickle files are saved from
the Metric_Aggregation module. The file that stores aggregated metrics by time periods is METRICS_<AGENCY>_<MONTH>_<YEAR>.p.
The file that stores aggregated metrics by 10-min time intervals is METRICS_10MIN_<AGENCY>_<MONTH>_<YEAR>.p.
Details of how each file is generated can be found in the documentation of functions aggregate_by_time_periods()
and aggregate_by_10min_intervals(). In short, before pickling, the time-period metrics file is a dict of JSON,
where each key is the type of metric in the form of “<time period name>-<aggregation level>-<percentile>”, e.g. “am_peak-segment-50” or
“full-corridor-90”, and each value is the corresponding metrics DataFrame normalized to JSON format. On the other hand, the 10-min metrics file
is a nested dict, the format of which is shown belw in the snippet containing metrics for two 10-min intervals (6:00 am to 6:10 am, and
6:10 am to 6:20 am).
{
((6, 0), (6, 10)): {
"median": (
stop-level metrics,
stop-aggregated-level metrics,
route-level metrics,
timepoint-level metrics,
timepoint-aggregated-level metrics
),
"90": (
stop-level metrics,
stop-aggregated-level metrics,
route-level metrics,
timepoint-level metrics,
timepoint-aggregated-level metrics
),
((6, 10), (6, 20)): {
"median": (..),
"90": (..)
}
}
Modules
- Data Class
- rove_parameters module
ROVE_paramsROVE_params.agencyROVE_params.monthROVE_params.yearROVE_params.date_typeROVE_params.data_optionROVE_params.suffixROVE_params.input_pathsROVE_params.output_pathsROVE_params.redValuesROVE_params.backend_configROVE_params.date_listROVE_params.get_iso3166_code()ROVE_params.get_backend_config()ROVE_params.get_frontend_config()ROVE_params.get_transitFileProp_or_vizFileProp()ROVE_params.generate_date_list()
- gtfs module
GTFSGTFS.REQUIRED_DATA_SPECGTFS.OPTIONAL_DATA_SPECGTFS.modeGTFS.aliasGTFS.rove_paramsGTFS.raw_dataGTFS.validated_dataGTFS.recordsGTFS.patterns_dictGTFS.load_data()GTFS.validate_data()GTFS.get_gtfs_records()GTFS.add_timepoints()GTFS.add_branchpoints()GTFS.generate_patterns()GTFS.improve_pattern_with_shapes()GTFS.generate_timepoints_output()GTFS.generate_stop_name_output()
- avl module
- rove_parameters module
- Shape Generation
- Metrics Calculation and Aggregation
- metric_calculation module
Metric_CalculationMetric_Calculation.gtfs_stop_metricsMetric_Calculation.gtfs_tpbp_metricsMetric_Calculation.gtfs_route_metricsMetric_Calculation.stop_spacing()Metric_Calculation.scheduled_headway()Metric_Calculation.scheduled_running_time()Metric_Calculation.scheduled_speed_without_dwell()Metric_Calculation.observed_headway()Metric_Calculation.observed_running_time()Metric_Calculation.observed_speed_without_dwell()Metric_Calculation.observed_running_time_with_dwell()Metric_Calculation.observed_speed_with_dwell()Metric_Calculation.boardings()Metric_Calculation.on_time_performance()Metric_Calculation.passenger_load()Metric_Calculation.crowding()Metric_Calculation.congestion_delay()
- metric_aggregation module
Metric_AggregationMetric_Aggregation.segmentsMetric_Aggregation.corridorsMetric_Aggregation.routesMetric_Aggregation.tpbp_segmentsMetric_Aggregation.tpbp_corridorsMetric_Aggregation.percentilesMetric_Aggregation.speed_rangeMetric_Aggregation.time_dictMetric_Aggregation.aggregate_metrics()Metric_Aggregation.aggregate_by_start_end_time()Metric_Aggregation.aggregate_by_10min_intervals()Metric_Aggregation.aggregate_by_time_periods()Metric_Aggregation.sample_size()Metric_Aggregation.stop_spacing()Metric_Aggregation.service_start_end()Metric_Aggregation.revenue_hour()Metric_Aggregation.headway()Metric_Aggregation.frequency()Metric_Aggregation.running_time()Metric_Aggregation.speed()Metric_Aggregation.wait_time()Metric_Aggregation.excess_wait_time()Metric_Aggregation.boardings()Metric_Aggregation.on_time_performance()Metric_Aggregation.crowding()Metric_Aggregation.passenger_load()Metric_Aggregation.passenger_flow()Metric_Aggregation.congestion_delay()Metric_Aggregation.productivity()
- metric_calculation module