If you begin monitoring some variable thing over time, initially you’re going to see lots of new extremes — new record highs, for example. As your dataset grows, the frequency of new ‘records’ should decline sharply. If the overall system behavior is static, new records rapidly become very rare indeed, governed by the statistics of the variation. A feature of the 165-year long global temperature series in recent years is that new records highs have not been rare at all. Depending who you listen to, there were new record hottest years in 1998, 2005, 2010 and 2014. That, of course, is because the system is nowhere near static; global temperatures have been increasing at an accelerating rate.

Assuming acceleration continues, how common will ‘new record highs’ become in future years? Again, that is a question for statistical prediction. The standard approach¹ goes like this:

- Separate the effects. Separate the system behavior into a secular component — a “trend” — that is not stochastic, and a variation component — “noise” — that is².
- Model the trend, preferably by applying some fundamental understanding of the system, or, less satisfactorily, by fitting a suitably parsimonious function.
- Subtract the trend from the observations to obtain a sample of the noise (the “residuals”).
- Model the noise … carefully; it may not be as simple as it looks.
- Obtain future stochastic projections by extending the trend (e.g. by extrapolation) and adding back random “realisations” of the noise model (“Monte Carlo simulation”).
- Perform statistics on the resulting synthetic projections to obtain the expectation of the future system behavior.

## The trend

For global temperature, there is in fact a quite deep fundamental understanding to draw on, from global climate modelling. For this effort I’m not going there. Instead I’m just going to use the crude polynomial fit I’ve been using in the charts, which nevertheless manages to capture both the data and the mechanistic projections (IPCC “middle estimates”) pretty well.

## The noise

Since the trend diverges a bit from the data in earlier years and the “system” was possibly a little different then (e.g. higher estimation errors), I’m going to model just a recent noise sample — since 1970. That bit looks like this:

Well that’s easy; a normal distribution should do it … except. The things to check here are, first, that the noise pattern remains similar over time (is it *homoscedastic* — this is, near enough), and second, whether there is *autocorrelation*. Is this month’s noise independent of last month’s, or are they related? The month before? (And so on…) Answer — they’re definitely not independent:

The monthly temperatures are strongly autocorrelated, but with a rapid decline in the correlation coeficient for “lags” out to about one year (the grey band is ±5% zero uncertainty). Then there’s another broad bump of autocorrelation at about 3 years, possibly reflecting the effect of ENSO (El Niño / La Niña).

There exists an alphabet soup of complex-sounding but really pretty simple models to deal with autocorrelated noise: AR, ARMA, ARIMA, ARCH, GARCH. Most work by combining an independent normal random component (“white noise”) with one or more trailing chain combinations of prior values, either just of the whole, or separately of the whole and the white bit. I’m going to use the simplest — an autoregressive (“AR”) model. That combines white noise with a linear combination³ of the chain of composite prior values. The model parameters are the standard deviation of the white noise, the length of the trailing chain and the (fixed) factors to apply at each lag. The simplest, a chain of length one month (“AR(1)” in the plot), doesn’t do it, but one of just two months length (“AR(2)”) works pretty well (subtracting it from the sample noise moves most of the autocorrelations into the zero uncertainty range). Of course such a short chain is unlikely to adequately model ENSO⁴ (it just shrinks it into the zero error range). There are better ways, but we’re trying to keep this simple.

## The result

What do we get? Unsurprisingly, the records keep coming as warming accelerates. The result looks like this (the future temperature trace is a single realisation; I actually did 100):

It’s apparent that we’re already warming so rapidly that *more than one year in three should be a new record*, on average. Within a few decades it will be one year in two, then, late in the century, nearly every year will be a new record hottest year, if carbon emissions have not abated.

### Notes

1. Interestingly, while this approach is now massively entrenched, it’s certainly not the only game in town … at least not every exact step as listed.

2. It’s rare that such a separation will be entirely valid, but this *a model* — it is wrong *by definition*.

3. The chains overlap at each new increment of course, so the net effect is far from linear — it’s an infinite polynomial. Lag-one AR — “AR(1)” — leads to an exponentially decaying autocorrelation pattern.

4. A popular ENSO indicator, the southern oscillation index, can be adequately modelled at monthly resolution with an ARMA(1,8;1) or a combined AR-ARCH model (Ahn and Kim, 2005). I’ve used the former pretty extensively for hydrological simulation. It’s likely that an ARMA model would work a little better here too (or maybe just AR(1,2,8), given the correlation of temperature and ENSO).

Ref: Ahn, Jae H., and Heung S. Kim. “Nonlinear modeling of El Nino/southern oscillation index.” Journal of Hydrologic Engineering 10.1 (2005): 8-15.

5. Note that the model future temperature realisation in the last plot has a near decade-long ‘pause’ in the 2030s. Actually, no it doesn’t. I made that with a random number generator. It’s just an artifact of the noise model; a completely normal, natural, expected effect of random variation, enhanced here by strong autocorrelation. (This month depends on last month which depended on the month before. Occasionally they’re going to get stuck following each other, before breaking out to chase the trend again.)