Our newest product, myTrip, serves smaller bus operators all over the UK, allowing users to see bus stops, the routes that serve them and vehicles tracked in real-time. Crucially, users can also buy tickets to use on those buses.
We’ve been working on myTrip at pace over the last few months. As the Passenger platform grows, supporting more operator apps and websites, we are continuing to invest in scaling our systems and processes to meet demand.
This post provides some insight into what goes on behind-the-scenes to enable a seemingly small feature in an app – location search for myTrip – and how overcoming new challenges for myTrip is already having a positive impact on the data infrastructure for every customer on the Passenger platform.
myTrip’s UK-wide coverage instantly made it the biggest app deployment by geography we have launched to date. In terms of searchable location data, this created challenges not present elsewhere on the platform, even when compared to the largest operating area of Passenger’s Premium (white-label) customers.
Users of Passenger’s Premium customer apps, such as NCTX Buses or Transdev Go, will be familiar with a feature where users can search for locations, including local attractions, in their respective regions. To provide this, we take bus stop data itself from operator TransXChange and NaPTAN, and we augment this with location data from OpenStreetMap (OSM) using a magic recipe of imports and search indexing.
Up until now, we had created a data pipeline using AWS Batch to take a slice of OSM for the whole of the UK, and create smaller slices of the data for each Passenger Premium customer operating boundary.
This would produce compressed XML files which are loaded into our systems to provide the mobile applications and websites with location search via our APIs.
This process has served us well for a long time, but because of the large operating boundary of the whole of the UK, we chose to disable this for the soft launch of myTrip last week. The process would have taken a significant amount of time and produced a 25GB+ uncompressed XML file – an order of magnitude larger than the system was designed to handle.
After our pioneer group of myTrip operators began to complete onboarding, Engineering re-evaluated what it might take to provide myTrip users with a better location search than a list of UK bus stops.
Analysing each step of the data pipeline to determine if optimisations could be made, we quickly determined the cause of the large file sizes – redundant data.
OSM data consists of nodes, ways and relations:
- You can think of a node as a single point in space, with one set of X and Y coordinates.
- Ways can be described as a single line of points (nodes), commonly used to form roads.
- Relations are commonly used to group a set of ways/nodes together, e.g. local authority boundaries.
Previously we processed and exported all of the data within each operator boundary for our systems to import, but only a small subset of the file would actually be used. The final step of the pipeline was to convert all ways/relations into nodes, as our location search is primarily designed for users to search for a single set of coordinates. This step uses the nodes of a way to determine the centre point, i.e. reduces a road polyline down to a single set of X & Y coordinates.
The old pipeline would still, however, include the original nodes that made up the way – data which is no longer useful as we have already determined the centre point. Using ‘osmfilter’ and ‘osmosis’, we modified the pipeline to reduce the UK dataset down to only ways/nodes/relations that would be imported into the location search index.
This reduced the file size down from 25GB+ uncompressed to around 500MB. Compressed that came down to 100MB. Huge savings. Although this file is still substantially larger than any of the previous OSM slices per operator, our systems can easily import this amount of data.
Previously the OSM import for all operators would take most of the day, importing the new UK-wide file only took a little over an hour, and the final in-memory index size was only 1.7GB.
The end result; UK-wide location search for myTrip. Just as it should be. You can download it and have a play at mytrip.today
We’ll continue to improve Location Search and evolve it to ensure it is useful to the context in which it is being used, whether in myTrip or a Premium app and website deployment.
If you enjoyed this window into how we’re designing new products for public transport in the UK, be sure to check out our current job vacancies and our Making Passenger podcast, where we discuss everything transport!