Log Pearson type 3 in Excel

Updated: skewness symbol "κ" replaced by "γ₁", for consistency.
Also see Fitting the Pearson type 3, Pearson type 4 in Excel.

The world of statistics seems every bit as bitchy and divided as any field. Bayesians vs frequentists? Nah, I’m talking a hundred years ago and a giant with dubious politics called Karl Pearson. If you’ve heard of him at all it might be via a thing called the Pearson product, a standard measure of correlation between datasets. But his great work, the Pearson family of continuous probability distributions seems all but ignored. Well, not quite, as we’ll see.

Pearson distributions still get a run in an arcane corner of engineering hydrology, where, thanks to the US Army Corps of Engineers more than half a century ago, one of them has become the go-to choice in flood flow statistics.  (It was also used to derive one of the sets of standard Australian design rainfall intensities I wrote about last year.) The Pearson type III distribution is used because it adds an important extra parameter to the standard bell curve (the normal distribution) to allow for skewness — cases where excursions on one side of the bell stretch a little further than those on the other. Skewness is characterised by another Pearson contribution: the third moment of the distribution (the first moment gives the mean, the second the variance … and the fourth a weird thing called kurtosis). Skewness is usually represented by γ₁, the third standardised central moment (γ₁ is what the reference calls “κ”, also sometimes referred to as √β₁). The Excel skewness functions return estimates of γ₁.

Pearson type III

Engineers often look in vain for a Pearson type III in all manner of statistics and computation tools. They whinged long and hard to a software provider I know until he added one to his specialist simulation tool. Why isn’t it there and what do I do about it? The answer is entirely semantic, and the fix ridiculously simple. The problem is that the world of statistics doesn’t call this distribution after Pearson, even though he seems to have been the first to describe it; instead they refer to it as a special case of the stock-standard gamma distribution. Call it a shifted gamma distribution or a three-parameter gamma distribution and it’s, oh … you meant that!

The math is deadly dull simple, and goes thus:

PIII(x)  =  ΓD(x-ϒ, α, β) ; γ₁ > 0

PIII(x)  =  ΓD(ϒ-x, α, β) ; γ₁ < 0

or if you like:

PIII(x)  =  ΓD((x-ϒ).γ₁/|γ₁|, α, β)

where ΓD is the standard gamma distribution and:

“shape”    α = 4/γ₁²

“scale”      β = |σ γ₁/2|         (which is   β = σ/√α )

“shift”       ϒ = μ – 2σ/γ₁       (which is   ϒ = μ ± αβ , depending on the sign of γ₁ )

with μ, σ, and γ₁ being the mean, standard deviation and skewness of the distribution.

And of course if you wanted the logarithmic transformation, it’s just:

LPIII(x)  =  ΓD((log(x)-ϒ).γ₁/|γ₁|, α, β)

where α, β, ϒ and μ, σ, γ₁ are defined in terms of log(x) instead of x (base 10 is usual).

Excel

So no Pearson III in Excel?  Actually it’s been there for decades … try (in Excel 2010+ speak):

PIII(x)  =  GAMMA.DIST((x – (μ-2*σ/γ₁)) * SIGN(γ₁),  4/γ₁^2, ABS(σ*γ₁/2),  FALSE)

And for the cumulative form and its inverse, it’s simpler to write the positive and negative skew cases separately:

P(x)  =  IF(γ₁>0, GAMMA.DIST(x – (μ-2*σ/γ₁),  4/γ₁^2, σ*γ₁/2,  TRUE), 1 – GAMMA.DIST(-x + μ-2*σ/γ₁,  4/γ₁^2, -σ*γ₁/2,  TRUE))

x(P)  =  IF(γ₁>0, GAMMA.INV(P,  4/γ₁^2,  σ*γ₁/2) + μ-2*σ/γ₁, -GAMMA.INV(1 – P,  4/γ₁^2,  -σ*γ₁/2) + μ-2*σ/γ₁)

The distribution is not defined for x beyond the shift, which is the lower bound for the positive skew case and the upper bound for the negative; nor — as is obvious from the math — if γ₁ is exactly zero (when it ought to collapse to the normal distribution).  Excel will deliver up an error for those cases, so it may be useful for completeness to write:

P(x) = IFERROR(IF(γ₁>0, GAMMA.DIST(x – (μ-2*σ/γ₁), 4/γ₁^2, σ*γ₁/2, TRUE),
IF(γ₁<0, 1 – GAMMA.DIST(-x + μ-2*σ/γ₁, 4/γ₁^2, -σ*γ₁/2, TRUE),
NORM.DIST(x, μ, σ, TRUE))), NA())

x(P) = IFERROR(IF(γ₁>0, GAMMA.INV(P, 4/γ₁^2, σ*γ₁/2) + μ-2*σ/γ₁,
IF(γ₁<0, -GAMMA.INV(1 – P, 4/γ₁^2, -σ*γ₁/2) + μ-2*σ/γ₁,
NORM.INV(P, μ, σ))), NA())

Fitting the distribution

If you consult a statistician on distribution fitting you’ll immediately be told not to fit “on the moments” (that is using Pearson’s moments: the mean, standard deviation and skew of the sample) because they’ll give a biased estimate.  No doubt that’s precisely correct, but I observe that statistics is, by definition, about probabilities — it is precise only in the sense that it’s not.  By all means use some fancier fitting scheme, but don’t lose sight of the fact that if you’re fitting to an ordered series with probabilities assigned by rank (as in flood frequency analysis), you’re dealing with a gross approximation from the start.  Applying fancy math to the fitting task might not improve things much, but always taking care to review the outcome thoroughly just might.

Anyway, maximum likelihood fitting is not that difficult to implement in Excel.

Example

A sample Excel Pearson III implementation can be downloaded here, along with a (rather slow) implementation of the Pearson distribution family, using PearsonDS: https://gergs.net/?attachment_id=2918

Reference

Using modern computing tools to fit the Pearson Type III distribution to aviation loads data, US DOT Office of Aviation Research, 2003.