I wanted to backtest the classic pairs trading strategy myself and try it on different stock baskets—shipping, utilities, tech. I built a backtest app with Claude Code to do exactly that. But how does the strategy actually work?
The Distance Approach
The concept is simple: you expect stocks that diverged far from each other to converge again.
To determine what exactly "far" means, Gatev et al. (2006) calculate the distance between two stocks each day over a 12-month formation period. They then calculate the sum of all these distances (the Sum of Squared Distances, or SSD) and the standard deviation. The 20 stock pairs with the lowest SSD are traded for 6 months—if the stocks diverge more than 2 standard deviations, a trade is entered. If not, no trades are made.
The trading logic works like this:
- New trading day: Check if we're already in a position
- If not in position: Does the spread exceed ±2σ?
- If spread > 0: Enter SHORT (sell A, buy B)
- If spread < 0: Enter LONG (buy A, sell B)
- Otherwise: No action
- If in position: Does the spread cross zero?
- Yes: Exit position
- No: Have we held too long? Force exit or keep holding
An Example: The Gold-to-Silver Ratio
A classic pairs trade example is the gold-to-silver ratio. When this ratio is historically low, gold is cheap compared to silver—you used to get over 80 ounces of silver for 1 ounce of gold, but at times you might only get 54 ounces.
How do you bet on this? Just selling silver seems foolish in a rising market. Instead, you also buy gold. If the market keeps rising, gold likely goes up too, softening the losses of your silver short. But if the market goes down, silver likely crashes harder than gold—your silver short becomes more valuable than your gold long, and you profit.
This is the beauty of pairs trading: you're betting on the relationship between two assets, not on market direction.
Testing the GGR Method
Using the software I created with Claude Code, I can now backtest the GGR method on any list of stocks. You can download the repo and try it yourself.
However, the results were disappointing to say the least. I have not been able to make a profit with any set of stocks other than utility stocks. My first thought when I saw the results was to throw the computer out of the window—but I needed it to figure out why there's such a big gap between what Zhu (2024) found and what I found.
Why the Discrepancy?
The first obvious reason I could think of: Zhu (2024) used a much bigger stock universe, while I was testing on the S&P 500. Although this seemed plausible, I initially didn't think it would make a big difference—a substantial percentage of stocks that should converge simply never did.
But it turns out I was wrong.
When you add more stocks, you find more pairs with lower SSD values. But is that useful enough? Since I didn't have access to the full stock universe, I opted to indirectly test this by running the strategy using only the top 5 and 10 pairs instead of the top 100.
Results by Number of Pairs
| Pairs Traded | Annual Return |
|---|---|
| Top 100 | -1.2% |
| Top 10 | -0.17% |
| Top 5 | +0.2% |
Running with the top 10 pairs also resulted in a loss, but smaller—only 0.17% per year. But when I ran the strategy using only the top 5 pairs, it finally made a profit of 0.2% per year.
This is not good at all, but at least it wasn't completely useless. It confirms that this strategy likely works better on a bigger stock universe where you have more pairs with extremely low SSDs to choose from.
The Flaw
Krauss (2015) pointed out two major flaws with the GGR strategy. Instead of just telling you what they are, let me guide you to the answer so it sticks.
Get into the mind of a trader. What do traders want? They want to maximize profits. But how do you maximize profits in pairs trading?
- Do you really want to trade pairs that stay very close together (low SSD)?
- Do you want pairs that mean-revert often or rarely?
- Does the GGR method account for mean-reversion frequency?
Think about it: a pair with very low SSD might stay close together precisely because it never diverges enough to trade. The GGR method optimizes for closeness, not for trading opportunities.
What This Means for Maritime Stocks
I'm still running this strategy on shipping stocks because the sector has some advantages:
- Natural segmentation: Tankers, dry bulk, and containers face different market forces
- High correlation within segments: Companies in the same segment respond to the same freight rates
- Volatility: Shipping stocks move a lot, creating potential divergences
But I'm now much more careful about pair selection. Rather than blindly taking the lowest-SSD pairs, I'm looking for pairs that:
- Have historically mean-reverted multiple times
- Show sufficient divergence to actually trigger trades
- Have reasonable liquidity for both long and short positions
The live signals page shows current pair status, but treat this as research, not trading advice. The strategy's edge—if it exists—is thin.
Code
The backtesting code is open source: github.com/dhrstrijker/ggr-backtest
Note: I used Claude Code to write this and haven't manually verified every line yet. I did write tests to ensure calculations are correct, but I'll be going through the code manually as well.
References
- Do, B., & Faff, R. (2010). Does simple pairs trading still work? Financial Analysts Journal, 66(4), 83-95.
- Gatev, E., Goetzmann, W. N., & Rouwenhorst, K. G. (2006). Pairs trading: Performance of a relative-value arbitrage rule. Review of Financial Studies, 19(3), 797-827.
- Krauss, C. (2015). Statistical arbitrage pairs trading strategies: Review and outlook. IWQW Discussion Papers, No. 09/2015.
- Zhu, X. (2024). Examining pairs trading profitability. Unpublished senior essay, Yale University.