The Data Score

The Data Score

Share this post

The Data Score
The Data Score
Data Deep Dive: Predicting China's Exports Using AIS Vessel Tracking Data (+ Other Uses)
Dataset Deep Dives

Data Deep Dive: Predicting China's Exports Using AIS Vessel Tracking Data (+ Other Uses)

We embark on a journey into AIS data in this Data Score entry; the first in a recurring series providing a detailed look at how specific dataset address investment debates.

Jason DeRise's avatar
Jason DeRise
Jun 28, 2023
∙ Paid
4

Share this post

The Data Score
The Data Score
Data Deep Dive: Predicting China's Exports Using AIS Vessel Tracking Data (+ Other Uses)
2
2
Share

Welcome to the Data Score newsletter, your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of alternative data insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from data. Before that, I successfully built a sellside equity research franchise based on proprietary data and non-consensus insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective that allows me to collaborate with end-users to generate meaningful insights.


Introducing the Data Deep Dive Series

I’ve appreciated all the positive feedback for The Data Score Newsletter as well as suggestions for improvement. One of the requests has been to go deeper into various datasets, explaining the use cases, methodology, and limitations associated with the analytics. The goal is to help insight seekers better understand the potential of various types of datasets and to help data providers think through how they can align their products with the outcomes needed by their clients.

AIS Data Deep Dive

In this entry, we dig into the world of AIS (Automatic Identification System) vessel tracking data1. Analytic techniques applied to the AIS data can help predict China's exports, monitor global trade, and understand real-time economic activity.

The methodology is built around real-time AIS data, tracking the largest container, dry bulk, and liquid tanker ships2. Using extensive metadata on fleets and global ports, as well as the current draft (draught)3 and deadweight4 tonnage of the vessel, a model can determine the aggregated cargo tonnage moving through global maritime points of interest.

First, I will start with the outcomes addressed by the data and approach before walking through the process from accessing the data to enriching the data. I will discuss the limitations, which are always an important component in deciphering insights from alternative data.

blue and red cargo ship on sea during daytime
Photo by Ian Taylor on Unsplash

Common questions addressed with the data

The potential applications of methodologies leveraging AIS data are wide-ranging. It's an instrument to assess the impact of global events on trade activity, track ship traffic at major ports, and observe changes in supply chains. It offers valuable insights for the analysis of trading activity over time, particularly when paired with deadweight tonnage and draft (draught) data.

  • China cargo export estimates:

    • As the title of this newsletter entry highlights, this is possible with a lead time versus official statistics. Use the draft to calculate the capacity utilization applied to the deadweight tonnage to derive the overall tonnage shipped. Confirm the relationship with statistical modeling with an in-sample, test-sample, and out-of-sample set of data. It is also possible to estimate the number of TEUs (twenty-foot equivalent units), which is an industry standard measurement of volumes, as an approximation for the number of containers shipped.

    • Considerations: Adjustments need to be made to leverage the satellite AIS data in place of the terrestrial AIS data to track vessels close to shore, which is different than other geographies. The model provides a clear measurement of the volume of maritime shipments, using cargo weight as a proxy. It’s important to note that the weight and value of cargo are not directly related. If the mix of cargo shipped changes such that the value per weight is higher or lower, the statistical relationship with country-reported export value could breakdown. Another constraint to consider is the possibility that higher-value goods could be shipped by air with a changing frequency compared to shipments via maritime channels, which would hurt the relationship between the AIS data and country-reported export figures. Various sources cite that 60% of China’s exported goods are transported by maritime channels5.

  • International oil and gas supply:

    • Monitor the oil and liquid natural gas (LNG) tanker departures and arrivals from key oil and gas ports, leveraging the deadweight and draft to estimate capacity; Monitoring oil and LNG tankers departing from hard-to-track oil-producing geographies; Or understand imports of oil and natural gas into China.

    • Considerations: Vessel’s AIS signals may go dark around some ports, which may mean this methodology is not directly able to monitor imports or exports of oil or natural gas to and from sensitive geographies. However, it is possible to build a probabilistic model based on the data that estimates the likelihood that a vessel delivered or picked up oil or gas from a nearby port (or secondary vessel).

  • International coal and iron ore supply

    • Monitoring bulker deadweight tonnage and draft from key ports in the metal and mining supply chains provides a highly accurate view of international supply volume. By digging deeper into the details of the berths and terminals in the port6, it is possible to relate the berths to specific companies, which will allow for accurate estimates of volumes shipped (e.g., Vale’s Brazilian export volumes).

    • Considerations: Some companies may share berths with other companies, which will require an estimate to allocate the volumes shipped between the companies.

  • Port and canal congestion:

    • In 2021, one of the biggest debates in the market will be supply chain bottlenecks, ranging from measuring the congestion time at the US West Coast ports to the attempts of the supply chain to work around the bottleneck by using alternate routes (such as via the Panama Canal to get to the East Coast). By tracking the date and time stamps associated with each vessel’s entry and exit from the port, the median and average time in port can also be calculated as a proxy for congestion (note: removing voyages passing through the geofenced port from the calculation).

    • Considerations: The port congestion became so difficult that monitoring anchorages offshore was also needed to fully understand the situation. For a period, the time in port improved, but only because more time was being spent in the anchorage waiting instead of in port.

  • Shipping Company Fleet performance

    • Leveraging metadata about the vessels, unique vessels can be mapped back to shipping companies, and metrics around capacity utilization of the fleet, average speeds, distance traveled, and number of ports entered can provide insight into how the quarter is progressing. Pairing it with market freight rate data could further enhance insights on company performance.

    • Considerations: Vessel ownership is constantly changing, so a process of mapping vessels back to owners needs to be maintained on an ongoing basis. Different metrics may need to be calculated on a pro forma and as-is basis depending on the acquisition and divestment activity of the fleet.

These questions can be addressed even more accurately by combining multiple datasets together to fill in the limitations of each dataset. The approach can either be done by applying a Bayesian approach to probabilistic outcomes or by combining datasets in a nowcasting model.

As a summary, the table below outlines sample investment outcomes needed, the related metrics, and the logic connecting the two.

This isn’t a complete list of use cases and investment debates to be addressed.

Data providers operating in this space should feel free to chime in with other use cases in the comments. Let’s get the conversation going.

Leave a comment

In the rest of this Data Score Newsletter entry, we’ll go into more details about how the raw data can be transformed into insights to address the use cases above, expanding on some of the ideas introduced above.

Underlying Data

There are two datasets that should be sourced and joined: AIS data and points of interest data.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jason DeRise
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share