Pearson type 4 in Excel

Updated...
Also see Pearson type 3 in Excel, Fitting the Pearson type 3

We did type III, so what about the Pearson type IV probability distribution? One author calls the type IV a Cinderella distribution¹ — it’s a beautiful thing, but completely lost to most. The Pearson distributions are a family of unimodal continuous probability distributions systemitised by the great mathematician Karl Pearson over a century ago. They subsume a range of well know distributions — normal, gamma, beta, beta prime, exponential, t, F, chi-squared, Cauchy — but extend beyond that straightjacket in ways practitioners could benefit from, but are often unaware of.

Type IV adds adjustable kurtosis to the “skewed normal” bell curve of the type III. Kurtosis, characterised by Pearson’s fourth moment of the distribution², is a measure of the “tightness” of the distribution peak — is it narrow with lots of stuff happening out in the tails (high kurtosis), or fat with thin tails (low kurtosis). The standard measure, β₂, is the fourth central moment of the distribution standardised by the variance squared. For the normal distribution β₂ = 3, so it’s useful to define a thing called excess kurtosis, designated γ₂ = β₂ – 3, which is zero for the normal distribution. The kurtosis function in Excel gives an estimate of γ₂.

Here’s what type IV looks like with some extra kurtosis added to the parameters I used for the type III plot³:

Pearson type IVPearson type III

 

The Pearson type IV is potentially one of the most useful corners of the Pearson family, but until the last decade or so it went virtually unknown and unused. The problem is that the thing is devilishly difficult to implement. Late in his life Pearson himself regretted failing to provide a workable implementation⁴:

“The series of tables ought to include tables of the incomplete G-function, i.e., the probability integral of the type IV curve, but the age of the present editor is likely to preclude his superintending any task, which even exceeds in the magnitude of its calculations that of the incomplete B-Function.”

Well, computing has moved on a bit, and the complex gamma function and Gauss hypergeometric function needed to compute type IV are available in many math code libraries. Physicist Joel Heinriech sets out how to go about it⁵ … but even then there are issues and tricks, as outlined by Martin Becker and Stefan Klößner for their R implementation⁶.

 

Excel

Rather than try to home brew our own version, it’s easier and better to just port to Becker and Klößner’s excellent PearsonDS package for R.

There are many ways to invoke R from Excel, including the comprehensive RExcel addin. For simplicity and to avoid yet another bloated installation, for this example I’ve just used the VBA shell interface suggested by UK R-blogger Shashia. Unfortunately that method can only return an answer via a file write/read operation, so is rather slow. All that’s required is some VBA code embedded in a macro enabled spreadsheet (*.xlsm) to create suitable custom Excel functions, plus a bit of linking R-script (which I’ve called Pearson.R ⁷) that needs to be in the same directory as the spreadsheet.

My example implementation can be downloaded here: http://gergs.net/?attachment_id=2918 ¹⁰.

Requirements:

 

References and notes:

  1. Cheng, R (2011) Using pearson type IV and other Cinderella distributions in simulation Proceedings of the Winter Simulation Conference (pp. 457-468).
     
  2. Moment nomenclature is stupidly dense in many treatments, because lots choose to mix, mash and mystify. Put as clearly as possible, it goes like this:

    Moment is a term stolen from engineering, where it refers to an effect acting over a distance (or the sum of that combination over some space). So in statics, a force acting over a lever arm gives a bending moment (force times distance). In dynamics, a rotating force acting at a distance out from the centre of a shaft is a torque (also force times distance). And if the “effect” is a mass at a distance out from the centre of a shaft, the sum around the shaft of mass times distance squared is the moment of inertia.

    Analogously, for statistical distributions, a moment amounts to the sum (integral!) of all the distribution ordinates times their offsets from the adopted moment centre — raised to the power of the moment level (first moment just the straight offset, second offset squared, etc.). Then we find that:

    • The first raw moment (moment about the origin) is the mean (μ), and the first central moment (moment about the mean) is of course zero
    • The second central moment is the variance (σ²).  The second standarised central moment is 1.0, because, well, the standard “standardisation” is to divide by the standard deviation to the power of the moment level ( ÷ σ), which is of course just the variance here ( σ²/σ² = 1 ).
    • The third standardised central moment is the skewness (our γ₁, also designated √β₁ … which really means ±√β₁ since it can take either sign)
    • The fourth standardised central moment is the total kurtosis (β₂ — that’s total, not excess; excess kurtosis is γ₂ = β₂ – 3)

    So when you read that the “moments” of a Pearson type IV are μ, σ², √β₁ and β₂, the author is actually mixing three different types of moments.
    (I of course go one better by using μ, σ, γ₁ and γ₂, which amounts to three different types of moments plus two popular kludges.)

    Then of course there are moments of the distribution and moments of the sample. Moments of the distribution are just descriptors of the position and shape of the distribution function. For the Pearson distributions they completely and uniquely describe the distribution, which is to say they determine the parameters of the distribution, therefore they can be used as the parameters. Moments of the sample are estimates derived from a sample, for example using the Excel functions: AVERAGE(), STDEV(), SKEW(), KURT(). They may or may not be good descriptors of the underlying probabilistic process leading to the sample variates.
     

  3. It’s worth noting in passing that the excess kurtosis (γ₂) of the Pearson type III distribution is not, in general, zero — it’s just uniquely determined by the skewness (γ₁). The formula is γ₂ = α/6 = 3γ₁²/2, so the type III excess kurtosis for the example parameters is about 1.22. (You can check that by entering “=3*γ₁^2/2” for γ₂ in Pearson.xlsm. It will return the type III … rather slowly of course.)
     
  4. Woodward, WA (1971) Approximation of Pearson type IV probability integral MSc thesis, Texas Tech University
     
  5. Heinrich, J (2004) A guide to the Pearson type IV distribution University of Pennsylvania.
     
  6. Becker, M and Klößner S (2013) Package ‘PearsonDS’, Version 0.97, R Project.
     
  7. The script actually links to the generic PearsonDS calls, so it will invoke the Pearson family — not just IV — depending on the moments provided. (In skewness-kurtosis space there is a small gap between types III and IV, occupied by types V and VI. The gap becomes large at large skewness and kurtosis, but beyond the level of many practical applications.)
     
  8. I prefer to add this to the system path, where it will affect all users.
     
  9. You only need to load these once. In Windows 7+ you’ll need to run the R application (RGui.exe) with administrator privileges (right click|Run as administrator) to install the packages to the main library; otherwise you have to accept installation to a user library where they’ll be unavailable to others.
     
  10. Pearson.R must be in the same directory as Pearson.xlsm. Note that Pearson.xlsm sets Excel recalculation to manual. You need to press F9 to force a recalculation (which takes a while, depending on the total number of R-linked function calls you’ve used).