Back to Writing

Getting Started with Data APIs for Maritime Analysis

A practical guide to fetching energy and commodity data from free APIs. Learn to pull oil prices, shipping indices, and more using Python.

Introduction

Every maritime data project starts the same way: you need data. And while premium services like Clarksons Research and Baltic Exchange offer comprehensive shipping data, there's a lot you can do with free APIs.

This guide covers practical techniques for fetching energy and commodity data that's relevant to maritime analysis. By the end, you'll have working Python code to pull:

  • Crude oil prices (WTI, Brent)
  • Natural gas prices
  • Historical commodity data

The EIA API

The U.S. Energy Information Administration (EIA) provides one of the best free data APIs for energy markets. Registration is free, and the API is well-documented.

Getting Your API Key

  1. Go to eia.gov/opendata
  2. Register for a free account
  3. Request an API key (instant approval)
  4. Save your key somewhere secure

Your First Request

Let's fetch WTI crude oil prices:

import requests
import pandas as pd

API_KEY = "your_api_key_here"
BASE_URL = "https://api.eia.gov/v2"

def fetch_wti_prices():
    """Fetch daily WTI crude oil spot prices."""
    endpoint = f"{BASE_URL}/petroleum/pri/spt/data/"

    params = {
        "api_key": API_KEY,
        "frequency": "daily",
        "data[0]": "value",
        "facets[product][]": "EPCWTI",  # WTI Crude
        "sort[0][column]": "period",
        "sort[0][direction]": "desc",
        "length": 365  # Last year
    }

    response = requests.get(endpoint, params=params)
    data = response.json()

    # Convert to DataFrame
    df = pd.DataFrame(data["response"]["data"])
    df["period"] = pd.to_datetime(df["period"])
    df["value"] = pd.to_numeric(df["value"])

    return df[["period", "value"]].rename(
        columns={"period": "date", "value": "wti_price"}
    )

# Usage
prices = fetch_wti_prices()
print(prices.head())

Output:

        date  wti_price
0 2025-01-03      73.42
1 2025-01-02      72.87
2 2024-12-31      71.23
3 2024-12-30      70.98
4 2024-12-27      70.12

Fetching Multiple Products

You can request multiple petroleum products in one call:

def fetch_petroleum_prices(products: list[str], days: int = 365):
    """
    Fetch prices for multiple petroleum products.

    Products codes:
    - EPCWTI: WTI Crude
    - EPCBRENT: Brent Crude
    - EPD2DXL0: NY Harbor ULSD (Diesel)
    - EPMRU: US Gulf Coast Conventional Gasoline
    """
    endpoint = f"{BASE_URL}/petroleum/pri/spt/data/"

    params = {
        "api_key": API_KEY,
        "frequency": "daily",
        "data[0]": "value",
        "sort[0][column]": "period",
        "sort[0][direction]": "desc",
        "length": days
    }

    # Add product facets
    for i, product in enumerate(products):
        params[f"facets[product][{i}]"] = product

    response = requests.get(endpoint, params=params)
    data = response.json()

    df = pd.DataFrame(data["response"]["data"])
    df["period"] = pd.to_datetime(df["period"])
    df["value"] = pd.to_numeric(df["value"])

    # Pivot to get products as columns
    df_pivot = df.pivot(
        index="period",
        columns="product",
        values="value"
    ).reset_index()

    return df_pivot

# Fetch WTI and Brent
prices = fetch_petroleum_prices(["EPCWTI", "EPCBRENT"])
print(prices.head())

Yahoo Finance for Shipping Stocks

For stock prices, yfinance is the go-to library:

import yfinance as yf

def fetch_shipping_stocks(tickers: list[str], period: str = "1y"):
    """Fetch daily prices for shipping stocks."""
    data = yf.download(tickers, period=period, progress=False)

    # Get adjusted close prices
    if len(tickers) == 1:
        return data["Adj Close"].to_frame(tickers[0])
    return data["Adj Close"]

# Major dry bulk stocks
dry_bulk = ["SBLK", "GOGL", "EGLE", "GNK"]
prices = fetch_shipping_stocks(dry_bulk)
print(prices.tail())

Combining Oil and Stock Data

Here's where it gets interesting—correlating energy prices with shipping stocks:

def analyze_oil_shipping_correlation():
    """Analyze correlation between oil prices and tanker stocks."""

    # Get oil prices
    oil = fetch_petroleum_prices(["EPCWTI", "EPCBRENT"], days=365)
    oil = oil.rename(columns={"period": "date"})
    oil = oil.set_index("date")

    # Get tanker stocks
    tankers = ["DHT", "FRO", "NAT", "INSW"]
    stocks = fetch_shipping_stocks(tankers, "1y")
    stocks.index = stocks.index.tz_localize(None)  # Remove timezone

    # Merge and calculate correlations
    combined = oil.join(stocks, how="inner")
    correlations = combined.corr()

    print("Correlation: WTI vs Tanker Stocks")
    print(correlations["EPCWTI"][tankers])

    return combined, correlations

combined, corr = analyze_oil_shipping_correlation()

Handling Rate Limits and Errors

APIs have limits. Here's a robust pattern:

import time
from typing import Optional

def fetch_with_retry(
    url: str,
    params: dict,
    max_retries: int = 3,
    delay: float = 1.0
) -> Optional[dict]:
    """Fetch with exponential backoff retry."""

    for attempt in range(max_retries):
        try:
            response = requests.get(url, params=params, timeout=30)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.HTTPError as e:
            if response.status_code == 429:  # Rate limited
                wait = delay * (2 ** attempt)
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")
            if attempt < max_retries - 1:
                time.sleep(delay)

    return None

Caching Data Locally

Don't hammer APIs unnecessarily. Cache locally:

import json
from pathlib import Path
from datetime import datetime, timedelta

CACHE_DIR = Path("./data_cache")
CACHE_DIR.mkdir(exist_ok=True)

def get_cached_or_fetch(
    cache_key: str,
    fetch_func,
    max_age_hours: int = 24
) -> pd.DataFrame:
    """Return cached data or fetch fresh."""

    cache_file = CACHE_DIR / f"{cache_key}.parquet"

    # Check cache
    if cache_file.exists():
        age = datetime.now() - datetime.fromtimestamp(
            cache_file.stat().st_mtime
        )
        if age < timedelta(hours=max_age_hours):
            return pd.read_parquet(cache_file)

    # Fetch fresh
    df = fetch_func()
    df.to_parquet(cache_file)
    return df

# Usage
wti = get_cached_or_fetch("wti_daily", fetch_wti_prices)

Next Steps

Once you have data flowing, you can:

  • Calculate rolling correlations between rates and stocks
  • Build alert systems for price movements
  • Create visualizations and dashboards
  • Feed data into ML models

For maritime-specific data, you'll eventually want access to:

  • Baltic Exchange: Freight rate indices (BDI, BDTI)
  • Clarksons Research: Vessel values, charter rates
  • VesselsValue: Fleet analytics
  • MarineTraffic: AIS position data

These are paid services, but the free APIs covered here will get you started on real analysis.

Code Repository

Complete code examples from this guide: github.com/dhrstrijker/maritime-data-apis