Tracking Russian jet fuel with rail freight data and Seafowl
How ACDC and Splitgraph helped us beat the supply chain blues
When a war breaks out, supply chains are thrown into disarray. Working on a campaign against Russian fossil fuel exports last year, Sam and I saw first-hand how routes to market for Russian crude oil were instantly disrupted by the country’s full-scale invasion of Ukraine, before being slowly reconfigured into a new system — a ‘shadow fleet’ of off-the-books tankers and crude laundered through Indian and Turkish refineries — that we see today.
For some commodities, though, war transforms the supply chain more fundamentally. Among these is jet fuel, which powers any modern invasion. While military jet fuel tends to contain additives to boost performance at high altitudes, the base product is the same thing used in civilian airliners, and all the most capital-intensive parts of its production are shared. Wherever you are in the world, the supply chain for civilian jet fuel is the supply chain for military jet fuel, and vice versa — Russia is no exception.
Luckily, dual-use commodities generate dual-use commodities data. Even before the full-scale invasion of Ukraine, industry analysts were using rail freight dispatch data from Russian Railways to track the build-up of oil products at the Ukrainian border, and we took that one step further at Global Witness, working with Le Monde to trace out the full supply chain from oil fields owned by Western companies to air bases near the border with Ukraine — work which colleagues at Der Spiegel and ZDF then expanded.
Behind the scenes, though, we’ve faced our own supply chain difficulties. When putting together the research for the story that would be covered by Le Monde, we had access (for a hefty fee) to a live feed of rail freight dispatch data from a major Western commodities data provider. But once the story had been published, the feed — provided by a Russian subsidiary — mysteriously disappeared from the provider’s platform, and we were never able to get a straight answer about what had happened to it. Such is the fog of war.
To get around the problem, we’ve built an alternative: a proof-of-concept interactive map showing station-to-station flows of jet fuel in Russia for the year 2022, with labels indicating proximity to known refineries and military installations. To put it together, we needed two building blocks:
A high-quality feed of Russian rail freight data. For this, we turned to our colleagues at the Anti-Corruption Data Collective, who were able to draw on years of expertise in obtaining and analysing ‘grey market’ data in the former Soviet Union to obtain bulk data that is far more rich than the feed we lost.
A JavaScript library for creating maps. Mapbox GL JS is a no-brainer here.
An easy way to serve the rail freight data for consumption by the web app. For this last step, we decided to use Seafowl, a project which describes itself as an “analytical database designed for modern data‑driven Web applications”.
Developed by Splitgraph, a Cambridge startup, Seafowl sits between Parquet files in an Amazon S3 or Google Cloud Storage bucket and your web app, exposing the data to SQL queries over HTTP. It does this by running a version of the Apache DataFusion query engine, an amazing project which uses the Apache Arrow memory format to enable high-performance analytical queries.
The best thing about Seafowl for us, as opposed to a purely browser-based query engine like DuckDB-Wasm, is that it makes intelligent use of browser and CDN caches, so repeated queries in your web app (e.g. a user looking up the same point-to-point flow of jet fuel that they did an hour ago) are extremely performant — this makes it perfect for building interactive web apps.
When working on a live military conflict, time is of the essence — that’s why access to good data and good tools that allow us to work fast, spinning up prototypes in a matter of hours, is so important. At Data Desk, we plan to use a combination of Seafowl and Observable — as demonstrated by Artjoms from Splitgraph — for all public-facing data projects going forward, and we’d be very keen to hear from others using the technology.
In the meantime, if you're a journalist, researcher or policymaker working on the war in Ukraine and require more detailed information on Russian oil product flows, don’t hesitate to contact ACDC.