Making Sense With Sparse Data
 
H. Paul Barringer, PE
Barringer & Associates, Inc.
Humble, Texas
hpaul@barringer1.com

Very Brief Summary-
Poisson distributions allow use of sparse data to make sense from data that looks like nonsense!  You only have a single parameter—the expected mean value of an event.

The Big Picture-
Sparse data can provide useful information for the Poisson distribution.  Sparse data may be the only data you have for making decisions.  Remember, some data is better than no data!  You only need the average (mean) value for calculating the probability of events.  The Poisson will then give the probabilities for 1, 2, 3,… events occurring.

For example, since 1886, the first year that data is available, 41 storms classified as hurricanes have passed within 75 miles of the Houston/Galveston, Texas warning area as of 2014. According to climatecentral.org, a hurricane strength storm is expected to strike the Texas coast once every 9 to 16 years.  The data has been collected over 128 years with  41 hurricanes recorded.  How do we make sense of these facts:  41/128 = 0.32 hurricanes/year is the average.  The expected number of hurricane events is the driver for the Poisson distribution as described in Figure 1.  We should expect a 72.6% chance to be hurricane free.  We should expect one hurricane per year with a 23.2% probability of occurrence.  We also have a 3.72% probability of seeing 2 hurricanes/year based on data acquired over 128 years.  

Figure 1: Hurricanes Expected in the Houston/Galveston Texas Area

You can download this Excel worksheet with up to 30 occurrences allowed based on the average data.

Notice how the shape of the reliability curves change with different expected mean number of events in Figure 2.

Figure 2: Reliability vs Occurrences For Different Means


Where Has the Poisson Distribution Been Used?

Years ago it was used to estimate the number of body bags needed for the Prussian army where cavalry soldiers were killed by kicks to the head by their horses, absentee data, floods, arriving telephone calls, typographical errors, spare parts needed for component failures, lost time accidents, first aid calls, hurricanes, and a host of other applications using the average number of expected events.  Remember some data is better than trying to make decisions with no data and you’ll never have perfect data!

Instrument Calibration Results –
Every production facility has many instruments that are calibrated at periodic intervals to verify compliance to standards within a certain specified interval.  Some instruments are of vital importance, some of moderate importance, and many more that are just indicators with allowed wider tolerances.  While the instrument population is large, with mixed models, mixed use, and mixed suppliers, they must conform to the specified tolerance requirements.  Failure to conform to the requirements represents a failure with calibration requirements and different time intervals which is set by the $Risk = (probability of failure)*($consequence). 
    1.  For crucial instrument requirements usually the calibration interval is short (say one year calibration intervals). 
    2.  For moderately important instrument requirements usually the calibration interval is of moderate length (say two year calibration intervals). 
    3.  For the bulk of instruments providing “general” information (say three year calibration intervals). 
Keeping the detailed histories of each device becomes a nightmare of record keeping.  Instruments that fail calibration are frequently marked on the instrument with a yellow “dot”.  Accumulation of three yellow markers on an instrument (It’s the old three strikes and you’re out!) are judged as unfit for further use and replaced with a more suitable and durable instrument with longer life.  Here are the inspection results by year:

It is amazing that all the small ratios of Failed/Inspection for each case appear in the early years.  All the large ratios of Failed/Inspection occur in the latter years!—don’t you wonder why?  This questions is worthy of investigation.

For critical instruments perhaps this is a call for adding multiple instruments to reduce the probability of failure.  For example, on a commercial airliner where the consequences of failure are high, they have redundant altimeters and even with redundancy controlled flight into terrain is the number one reason for large aircraft failures, the third reason for helicopter failures, and the third reason for failures of general aviation aircraft which usually do not have redundant altimeters, see http://www.barringer1.com/pdf/Essential _Elements_of_a_Successful_Reliability_Program_Updated_2_Per_Page.pdf .  In dual altimeter aircraft the altimeters must agree within tight limits or else both altimeters go to the instrument shop for recalibration.

Summary Of Simulation Findings -
Here are the data from the different inspection intervals:

Figure 3: Failed/Inspection Ratios For Min.
                1 Year Inspections

Figure 4: Failed/Inspection Ratios For Max.
                1 Year Inspections

 

Expected Mean = 0.018

                 Expected Mean = 0.039

 

Figure 5: Failed/Inspection Ratios For Min.
                2 Year Inspections

Figure 6: Failed/Inspection Ratios For Max.
                2 Year Inspections

 

Expected Mean = 0.012

                 Expected Mean = 0.027

 

Figure 7: Failed/Inspection Ratios For Min.
                3 Year Inspections

Figure 8: Failed/Inspection Ratios For Max.
                3 Year Inspections

 

Expected Mean = 0.012

                 Expected Mean = 0.026

 

Figure 9: Failed/Inspection Ratios For Min.
                10 Year Summary

Figure 10: Failed/Inspection Ratios For Max
                10 Year Summary

Expected Mean = 0.013

                 Expected Mean = 0.027

Figures 3 to Figure 10 show zero occurrences between 96.2% reliability and 98.8% reliability.  If the devices were used in parallel the reliabilities become:
For two in parallel they become R2 = 1-(1-.962)2 = 1-(0.038)2 = 1-0.00144 = 99.86% and R2 = 1-(1-.988)2 = 1-(0.012)2 = 1-0.00144 =  99.9856%
For three in parallel they become R3 = 1-(1-.962)3 = 1-(0.038)3 = 1-0.00005487 = 99.99% and R3 = 1-(1-.988)3 = 1-(0.012)3 = 1-0.000001728 =  99.99998%

Download a copy of this problem as a PDF file.  You can download a copy of the Excel Poisson Distribution.

Send your comments to Paul Barringer.

Click here to return to the top of this page.

Return to Barringer & Associates, Inc. homepage
Last revised July 31, 2016
© Barringer & Associates, Inc. 2016



 

.