Statistical Arbitrage: Foundations of a Mean-Reversion Model in Oil Futures
November 5, 2024
Author: Lewin Hafner
Background & Disclaimer: This is neither a strategy nor a model that can and should be deployed in the market. It primarily serves the purpose of illustrating some knowledge and ideas I've accumulated over time. All text is authored by me as I do not rely on AI tools such as ChatGPT to do so - therefore, there might be errors in grammar and punctuation. The Python and R code is included at the bottom of this page.
Introduction
Contrary to what a majority of retail participants think, markets can be viewed through either a directional or a relative lense. While the former is concerned with bets on where a certain market is headed (up or down), relative value deals with the relative pricing of two or more assets. Think of Pepsi and Cola: as both companies share many of the same industry characteristics and their product displays a high degree of substitutability, their share prices are expected to follow roughly the same path. History shows that they do so - but to varying degrees. This gives rise to statistical arbitrage, a class of strategies that employ statistical rigor to exploit perceived relative mispricings. One key advantage such strategies provide is their beta-neutral profile, potentially allowing for orthogonal returns relative to the market. The following piece of work delves deeper into statistical arbitrage by (i) sheding light on some of the theory, assumptions and concepts involved with it and (ii) developing the foundations of a strategy that aims to capture such relative mispricings in oil futures.
Efficient Market Hypothesis and Market Anomalies
One primary function of all sorts of markets is to match buyers and sellers and ensure efficienct pricing of tradables. Central to this tenet is what financial theory refers to as the "efficient market hypothesis" (EMH), which states that asset prices reflect all available information and are thus always aligned with their fair value. Perhaps not explicitly - but EMH implies that alpha generation (excess returns over a benchmark) through stock picking and market timing is not possible and participants would be better off with passive exposure to the market. Accordingly, market bubbles wouldn't (or do not) exist, research is obsolete and market regulation as well as government intervention should be limited. EMHs statement that information is instantly embedded within prices also implies that prices react only to novel information - and as information is not predictable, asset prices are not predictable but random (see Malkiel (2003) and Dupernex (2007) for discussions).
I believe there are three points worth highlighting. First, numerous studies prove the existence of anomalies, patterns and abnormal behavior in the context of financial markets. Hubermann and Regev (2001), for example, present a case where a NY Times article led to a parabolic move in biotech stocks, although the news had been published in a scientific journal 5 months prior. This goes to show that the diffusion of news can create inefficiencies, i.e. a situation where prices of assets do not fully reflect their fair values. Another anomaly that is extensively covered by research is the momentum effect - the observation that rising (declining) prices tend to be self-reinforcing. If market progressions were to follow a "random walk" (hence the market would have no "memory"), there would be no room for self-reinforcing dynamics within the system. Artefacts like the liquidation cascade in Gamestop in 2020, however, make it hard to believe so. For a great review and discussion, check out Schwert (2002) and Malkiel (2003).
Second, the debate about whether markets are efficient and any deviations from this assumption cannot be exploited are actually two debates: just because inefficiencies are (seemingly) not exploitable to the extent that they can generate alpha, doesn't mean they don't exist. The pursuit to generate above-market returns is hindered by market frictions like transaction costs etc. Therefore, asset prices can be dislocated for prolonged periods.
Third, the spirit of EMH seems to be tied to the idea of an homo oeconomicus - an economic model that postulates that humans behave in ways that maximize their utility (i.e. fully rational). Accordingly, humans are assumed to be rational in the way they process new information, evaluate different options and make financial trading decisions. Insights from behavioral finance clearly reject these assumptions: humans are prone to a variety of cognitive biases such as confirmation bias (i.e. the filtering of information in favor of pre-existing beliefs), recency bias (i.e. giving more weight to recent events than to those in the distant past), overconfidence (i.e. an overestimation of one's abilities) and loss aversion (i.e. losses instill more pain than wins induce pleasure) (see XYZ for a great review). Taken together, such distortions pave the way for periods in which asset prices decouple from their fundamental value and create mispricing opportunities. Lastly, my humble opinion is this: ask any practicioner that deals in discretionary trading and you'll realize that markets are everything but rational.
Mean-Reversion and Cointegration
The discussion of EMH above introduced inefficiencies, a key assumption underlying many StatArb models that try to exploit those. The question arises how exactly inefficiencies are identified.
StatArb - and from a narrower point of view pairs trading - is centered around the concept of mean-reversion, a statistical phenomenon whereby time-series tend to fluctuate around a mean. Formally known as "regression toward the mean", it refers to the observation that the larger the deviation of a single observation from its population mean, the higher the probability that the next observation is closer to the mean. Intuitively, the mean acts like a central tendency and the larger a single value deviates from it, the higher the gravitational force to revert back to it.
Normally, relevant long-short pairs are identified in a process that considers dozens of possible combinations or even entire groups of similarily-behaving assets. Common approaches include the distance approach that leverages distance metrics such as Euclidean Distance; cointegration that seeks to find long-term equilibrias; stochastic control; and others (see Krauss (2017) for a profound review). We center our work around cointegration and focus on one specific pair for which we have fundamental reasons to believe they are cointegrated.
Cointegration is a statistical property of two or more time-series' that tells us whether those have a long-term equilibrium-relationship. Cointegration is well-suited for pairs trading for two reasons. First, pairs trading revolves around the idea of trading similarily-behaving assets - cointegration allows us to identify co-related pairs in a quantitative manner. Second, models built on the premise of mean-reversion require consistent statistical properties (mean etc.) to ensure model profitability throughout time. As we will see, cointegration naturally tells us whether that's the case or not.
Formally, two variables X and Y are said to be cointegrated if both are integrated of order d > 0 and a linear combination a*X + b*Y of order < d exists. Order of integration, in this context, refers to the degree to which variables need to be differenced in order to make them stationary (stationarity ensures consistency in mean, variance etc.).
\( X \sim I(d) \quad \text{and} \quad Y \sim I(d), \quad d > 0 \)
\( \Delta^d X \quad \text{and} \quad \Delta^d Y \quad \text{are stationary} \)
\( aX + bY \sim I(d') \quad \text{with} \quad d' < d \)
Engineering a mean-reverting spread
Now that some of the prerequisites have been visited, we can proceed with the actual development of our model.
Different Crude Oils mainly differ in their geographical origination, density and sulfur content. Two widely quoted types of Oil are WTI and Brent, with the latter being a widely used Benchmark. These differences, along with market frictions such as transaction and transporation costs, give rise to a spread in the price of the two (see Geyer-Klingeberg and Rathgeber (2019) for a more nuanced view). Within the context of pairs trading, a spread refers either to a) the absolute difference in two prices, b) a ratio that expresses the price of one asset in terms of another, somewhat like a relative strength measure, or c) the residuals of a linear regression that establishes a linear relationship between two variables. These different ways of engineering a spread have distinct implications. In the case of a synthetic ratio, buying (selling) the spread translates to longing (shorting) Brent and shorting (longing) WTI, effectively hedging our directional exposure in dollar-terms. A truly beta-neutral implementation, on the other hand, incorporates a hedge ratio b (derived from linear regression) that establishes how many units of WTI need to be shorted (longed) per unit of longed (shorted) Brent. It will become clear later on how cointegration relates to a beta-neutral ratio.
As laid out above, cointegration is essential to pairs trading as it ensures a) a meaningful statistical relationship between two assets and b) a linear combination of the two has consistent properties that align with the concept of mean-reversion. One possible way to test for cointegration is through the use of the Engle-Granger-Test. EG testing is done in two steps which are briefly covered here. 1. Determine if X (WTI) and Y (Brent) are integrated of order d, with d > 0. To determine whether both variables are integrated I(d > 0) we test for stationarity using the ADF-test. Per the results below we cannot reject the null hypothesis for both WTI and Brent, indicating they are likely non-stationary processes. 2. Regress Y on X using OLS and determine whether the residuals are stationary. Again, we test for stationarity using the ADF-test. Per the results our residuals are stationary, leaving us with a spread that aligns with the principle of mean-reversion.
Augmented Dickey-Fuller Test Results
Metric
Brent
WTI
Residuals (Spread)
ADF Statistic
-2.329715278626235
-2.55957223666333
-3.9712827687091052
p-value
0.1625632000797413
0.10164400135034563
0.0015685775637219045
Critical Value (1%)
-3.4318748338660824
-3.4318748338660824
-3.4318748697792314
Critical Value (5%)
-2.86221537753002
-2.86221537753002
-2.8622138030507975
Critical Value (10%)
-2.5671295085622763
-2.5671295085622763
-2.567128670385915
Taken together, these results suggest that WTI and Brent are indeed cointegrated. More intuitively, it tells us that regardless of whether Brent and WTI decouple in the short run, they eventually end up at the same place in the long run. Additionally, we can infer that the residuals of our regression are stationary, and by the same token, that our beta-neutral spread is stationary.
We proceed with modelling by extracting these residuals (the spread) from our linear regression. We normalize the scale by calculating Z-Scores, thereby ensuring that values are expressed in terms of standard deviations from the mean. As the plot below shows, our times series is not of trending nature, aligning with the property of stationarity.
As implied above, our aim is to buy the spread (long Brent; short WTI) or sell the spread (short Brent; long WTI) when it significantly deviates from its own mean. We need a quantifiable way to assess whether to buy or sell the spread (i.e. we need to construct an entry-model). To do so, we calculate Z-Scores based on rolling exponential moving averages and rolling standard deviations. This allows for an increased adaptability across different volatility regimes. Further, we set thresholds at Z-Scores of 1 and 1.5 standard deviations to generate buy and sell signals. The first plot below visualizes when those signals are flashed; the second plot shows how they translate to our spread (the one we would actually trade).
Limitations and Improvements The results look promising, given the simplistic nature of our model. A great number of short-term swings are captured, indicating that our model could be of use. However, there are various factors we would need to consider before developing a strategy based off it. First, our lookback period is way too short: while our model covers 3.5 years worth of data, models in a real-world setting require +10 years of data to be considered significant. Second, models intended for real-world use need to be tested. This is usually done by splitting the data into two parts, using the former (in-sample data) to build a model and the latter (out-of-sample data) to test how the model performs when it is given new data. Third, costs associated with trading (transaction costs, slippage, rolling fees, market impact) need to be considered in order to render a realistic picture of model profitability. Fourth, in live trading - as the name implies - data is computed in real-time. Our model makes use of daily closing prices, meaning no intraday movements are considered. This could lead to situations in which our model does not flash buy/sell signals or simply does so too late, thereby limiting profitability.
With these things in mind, this piece of work serves as a great foundation for future explorations in relative value and statistical arbitrage.
Geyer-Klingeberg,J., & Rathgeber, A. W. (2021). Determinants of the WTI-Brent price spreadrevisited. The Journal of Futures Markets, 41(5), 736-757. https://doi.org/10.1002/fut.22184
Huberman, G.,& Regev, T. (2001). Contagious Speculation and a Cure for Cancer: ANonevent That Made Stock Prices Soar. TheJournal of Finance, 56(1),387-396. https://www.jstor.org/stable/222474
Krauss, C. (2017).Statistical Arbitrage Pairs Trading Strategies: Review and Outlook. Journal of Economic Surveys, 31(2), 513-545. https://doi.org/10.1111/joes.12153