Mapping data is at the heart of Soundscape and there’s a lot to think about!
The map in the UI on iOS is provided by Apple maps. It’s proprietary, but more importantly free for native apps. The mapping data used for the audio descriptions comes from a custom server. This code was written by Microsoft and turns OpenStreetMap (OSM) data into GeoJSON tiles at a single zoom level for the whole world. The server is constantly updating from OSM and so should be up to date with all edits made to OSM data. This means that OSM edits can be made to improve the mapping around a Soundscape users location and those will be reflected very quickly on the Soundscape app. The GeoJSON tiles are downloaded by Soundscape and used to describe the nearby points of interest (POI) and roads. Obviously there are two sources of information here, so the UI and the audio might not always match.
Android doesn’t have a source of free maps for UI, so we immediately need a new solution for the UI. We chose MapLibre Native Android as the library to render the map UI. It’s very widely used, supports a vast array of tile types and styling and is still very actively developed. Because we’re now paying for the server costs involved in the map UI we need to take this into consideration when picking the source of the tiles of data. The best technology available seems to be protomaps. This provides the most cost effective approach to serving mapping tiles. The cost calculator they provide shows that 10 million tiles a month could be served for around £10. The tiles are vector tiles (like compressed JSON) rather than image tiles (e.g. PNG) which means that they can be dynamically styled by MapLibre when it renders them on the phone. This opens up a number of new UI possibilities, but also means that we may be able to switch from using the soundscape-backend server GeoJSON to using the same vector tiles for audio as we do for UI.
This is the big question for server costing - how many tiles will users request from the server every month? The tiles are cached on the phone, and for the audio tiles only tiles immediately surrounding the user are downloaded and at a single zoom level. For the UI, zooming out and panning around the world means that far more tiles could be consumed. However, the majority of users will likely be using the app within a fairly fixed geography and the map UI is of limited use to many visually impaired. Although we know how many tiles we serve from soundscape-backend, we don’t know how many Apple Maps currently serve up to the Soundscape iOS app.
This describes the journey of the data in the vector tiles.
The tile server will be serving up a single PMTiles file. Much of the data in that comes from OpenStreetMap, though there’s other data pulled in too. We’re currently using planetiler to generate the .pmtiles
file. The contents of the .pmtiles
file is highly configurable. It contains vector tiles at a range of zoom levels, but the contents of the vector tiles can be configured to have whatever data we want in them.
We use MapBox vector tiles (mvt). They are protobuf encoded and provide a tile of mapping data similar to GeoJSON but much compressed.
Decoding from protobuf is handled automatically in Android, all that’s required is a .proto
file which describes the format. In our case it’s vector_tile.proto. However, we still need to decode the vector tile data from its compressed format.
There are two ways in which the vector tiles compresses the data:
feature
. This means that each string is only stored once per tile.extents
value which declares the length of its side. The origin of the tile in latitude and longitude is known, and so the x,y position can be easily converted into latitude and longitude. The default value for extents
is 4096
and this does mean that the resolution of the locations within the tile varies with zoom level. At zoom level 0 a single tile represents the whole world, at zoom1 level 1 there are 4 tiles and so on down through the zooms. The size of a tile in metres is therefore the circumference of the earth divided by 2zoom-level = (2 x $\pi$ x 6378137) / (2zoom-level). At a zoom of 15 and with extents
at the default of 4096, that gives a resolution of 30cm so that’s well within what we need for our mapping calculations.The file that ends up on the server is a single .pmtiles
file which consists of all of the vector tiles smashed together into a single file with a metadata index. The server uses that metadata to find each tile so that when a request like:
https://api.protomaps.com/tiles/v3/$tileZoom/$tileX/$tileY.mvt
is received it can return the piece of the .pmtiles
file corresponding to that tile. Nothing other than the server knows that the vector tile was stored in protomaps format, the client just receives a regular MapBox vector tile.
If you looked at the MapBox vector tiles specification you’d notice that there’s not much in there other than layers, lines, points and properties. What actually goes into the tile, and how that data is then used within the app can all be configured. The list of layers on a tile might be:
earth
landuse
natural
physical_line
physical_point
pois
roads
water
planetiler
uses this schema and the layers are described here. When displaying the map in the app, a style.json
file is used to describe how to display the contents of each of the layers. This includes where to get the tiles, the fonts and the icons from. The style has to be matched to the tiles as the naming convention for layers isn’t fully standardized. If a layer isn’t used in the style, then it should be removed from the vector tile generation. Designing the style is a huge amount of work and hopefully we can continue to use the same profile with minimum tweaking.
The MapLibre library is doing all of the work for the map rendering to the UI. A URL to a style.json
is provided along with an API key for the tile provider and that’s all it requires. It caches tiles and deals with all the tile decoding and rendering.
We’re currently looking at whether we can use the same tiles for the audio as for the UI. The answer appears to be yes, but the work is ongoing. The changes required in the app are:
Ensuring that we have the same data available as with soundscape-backend
may take a little more work, but there is a great benefit to having a single source of mapping data.
We need a lot more concrete data here - these are very initial thoughts.
The app performance has two aspects:
The actual serving of the tile data is likely to be performant enough for us, though presumably there are all sorts of possible improvements that can be done on the Cloudflare side. The main challenge is really the generation of the tile data. The protomaps cost calculator assumes that the .pmtiles
file is generated offline and then uploaded to Cloudflare. The upload is free, but it is a large file. Here’s what I’ve seen so far:
Action | File size | Time in hours |
---|---|---|
Download of planet.osm.pbf |
~80GB | ~5 |
Build the .pmtiles file with planetiler on linux box (AMD ThreadRipper with 64GB RAM and plenty of SSD) |
76GB | ~5 |
Upload to Cloudflare | 76GB | >8 |
For reference the planetiler
command used was:
java -Xmx40g -jar planetiler.jar --osm-path=planet-latest.osm.pbf --area=planet --bounds=planet --download --fetch-wikidata --output=output.pmtiles --nodemap-type=array --storage=mmap
And the build summary was:
Max tile sizes
z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all
boundary 5.4k 37k 43k 25k 20k 36k 29k 27k 22k 33k 36k 30k 31k 21k 17k 43k
landcover 1.5k 990 8.1k 4.6k 3.2k 31k 17k 271k 334k 235k 153k 175k 166k 111k 333k 334k
place 56k 150k 1.2M 1.1M 733k 360k 188k 123k 88k 118k 111k 95k 68k 123k 227k 1.2M
water 8.7k 4.3k 11k 9.9k 17k 13k 91k 114k 132k 119k 133k 94k 167k 102k 91k 167k
water_name 15k 32k 50k 28k 28k 16k 9.3k 9.5k 13k 9.9k 7.4k 6k 3.9k 9.6k 29k 50k
waterway 0 0 0 549 3.7k 1.6k 18k 13k 9.8k 28k 20k 16k 59k 75k 88k 88k
landuse 0 0 0 0 2.6k 3.6k 54k 77k 116k 113k 102k 112k 56k 115k 48k 116k
park 0 0 0 0 56k 204k 130k 87k 99k 83k 90k 55k 47k 19k 50k 204k
transportation 0 0 0 0 59k 59k 59k 123k 88k 134k 91k 68k 313k 190k 129k 313k
transportation_name 0 0 0 0 0 0 38k 22k 19k 14k 37k 23k 36k 28k 179k 179k
mountain_peak 0 0 0 0 0 0 0 18k 20k 17k 15k 14k 11k 337k 234k 337k
aerodrome_label 0 0 0 0 0 0 0 0 8.8k 5.5k 8k 4.5k 4.1k 3.5k 3.3k 8.8k
aeroway 0 0 0 0 0 0 0 0 0 0 16k 27k 36k 31k 17k 36k
poi 0 0 0 0 0 0 0 0 0 0 0 2.1k 69k 32k 1M 1M
building 0 0 0 0 0 0 0 0 0 0 0 0 0 141k 709k 709k
housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 315k 315k
full tile 87k 224k 1.3M 1.2M 773k 532k 410k 475k 412k 431k 305k 216k 398k 341k 1.3M 1.3M
gzipped 51k 127k 637k 583k 385k 281k 248k 327k 292k 303k 220k 154k 210k 227k 779k 779k
Max tile: 1.3M (gzipped: 779k)
Avg tile: 143k (gzipped: 86k) using weighted average based on OSM traffic
# tiles: 271,427,243
# features: 3,479,181,963
Finished in 1h18m15s cpu:34h7m23s gc:6m10s avg:26.2
read 2x(13% 10m17s wait:48m41s done:12s)
merge 1x(14% 10m55s wait:1h2m14s done:16s)
encode 30x(75% 58m54s block:3s wait:8m26s done:16s)
write 1x(7% 5m36s wait:1h9m36s done:16s)
Finished in 4h59m29s cpu:76h10m16s gc:11m13s avg:15.3
FINISHED!
overall 4h59m29s cpu:76h10m16s gc:11m13s avg:15.3
wikidata 6m34s cpu:3h3m54s gc:14s avg:28
lake_centerlines 3s cpu:21s avg:6.9
water_polygons 1m27s cpu:36m29s gc:7s avg:25.3
natural_earth 37s cpu:38s avg:1
osm_pass1 20m23s cpu:2h28m47s gc:7s avg:7.3
osm_pass2 2h16m15s cpu:33h58m31s gc:3m31s avg:15
ne_lakes 0.4s cpu:0.6s avg:1.4
boundaries 11s cpu:16s avg:1.5
agg_stop 0.5s cpu:0.9s avg:2
sort 55m29s cpu:1h53m39s gc:1m4s avg:2
archive 1h18m15s cpu:34h7m23s gc:6m10s avg:26.2
----------------------------------------
archive 76GB
This was using the default styling and configuration.
There’s lots of information on the planetiler github on how to optimize planet sized tile generation: https://github.com/onthegomap/planetiler/blob/main/PLANET.md. More memory is the main accelerator, and in their example they were using a cloud server with 128GB of RAM and it took 3hrs21min to do everything including the downloading of the OSM data. According to the cloud company they use that would have cost ~$5. The speed of connection in the cloud to both get the OSM data and push it back to Cloudflare obviously makes this much faster than running on my local machine.
If we could run a similar cloud build, then a weekly rebuild of the planet would be relatively cheap at only ~£20 per month. OSM publish the planet-latest.osm.pbf
once per week so it could be timed to build from that.
planetiler
is limited to a maximum zoom level of 15. Tiles at that zoom level are 1222 metres x 1222 metres and with extents
set to 4096 have a resolution of 30cm. MapLibre can still zoom in further, and although the rendering is done at the higher zoom the data used will be from the zoom level 15 tile. Soundscape iOS GeoJSON tiles were at zoom level 16 so some care will have to be taken with style and configuration to ensure that we have the same data available.
planetiler
cannot do incremental tile generation, it always starts from scratch. The planet.osm.pbf
can be updated incrementally, but the main time sink is the tile generation. However, this could be argued to be a benefit as any issues creating tiles in one build would hopefully be gone by the next build rather than slowly accruing problems.