Problem Of The Month June 2001

Flood Data and Gumbel Largest Distributions



Floods are a reliability problem.  When you’ve got floodwater inside your house, you’ve got a big time failure!


Reliability successes always terminate in a failure—it’s not WILL they terminate, it’s WHEN the failure will occur?  Nothing functions forever without a failure (Want an illustration about the lack of forever?—Where are the seven wonders of the ancient world?). 

As you can see from the flood photos below, you can also ask where is Houston (by the way, autos are around the 18-wheeler trucks but submerged!).


Ask the 25,000 homeowners in the Houston area how they feel about water in their house plus another 10,000 who will make delayed reports about their flooding problems.  The June 9, 2001 Houston flood brought ~35 inches (89 cm) of rain in 5 days.  Roughly ~24 inches (61 cm) fell in less than 24 hours.  The expected cost for the cleanup effort is US$1Billion (yes Billion)—on June 15, 2001 the damage estimate was raised to US$2Billion!  Houstonians think this is the worst it will ever be—HA!—more bad news to follow below. 


The reason for the large amount of rain is because of a tropical depression, which formed in a single day over the Gulf of Mexico and oscillated back/forth over the city rather than blowing off to the northeast in a customary weather pattern.  More specific data since the June 9, 2001 flood is available at where you can enter your street address/ZIP for local details and the Harris County Flood Control District data available at for specific details as the new flood plain information is developed for the 3.7 million people living within the watershed area of 1,756 square miles (454,802 Hectare)—Harris County is the 3rd most populous USA County.


Daily stream flow data with gage depth heights for various waterways in the USA are available on the Internet.  One important set of data is the annual flood   The single largest peak flow in one year is consider the reportable flood record—yes you could have four really big flow rates in one year which are larger than all previously recorded data but into the record book goes only the largest value for the year (it may not be fair but that’s the way it has been done since Egyptian days of their annual floods).


You can use the flood data and WinSMITH Weibull software to make predictions about future water heights to decide if you need flood insurance.  The Gumbel largest (or upper) distribution is the tool of choice for floods. 


The Gumbel largest value cumulative distribution is expressed in Figure 1 for the cumulative distribution function (CDF).  The equation is generically similar to equations for the largest extreme value listed in Table 2 (page 24) of ”Statistical Theory Of Extreme Values And Some Practical Applications—A series Of Lectures by Emil J. Gumbel”., National Bureau of Standards Applied Mathematics Series 33 Issued February 12, 1954 (Order PB175818 from the National Technical Information Service of the U.S. Department of Commerce). 


Gumbel is best known for his probability distributions concerning flood data described in his book: Statistics of Extremes, E. J. Gumbel, Columbia University Press, New York, 1958, ISBN 0-231-02190-9 which is available through the used book market and this book will tell you more than you wanted to know about extreme value statistics.


This is a type 1 extreme value Gumbel Largest extreme cumulative distribution function.  X is a variable, Ψ is a scale factor, and d is a shape factor.  Small d’s give steep line slopes on Gumbel probability paper.  This equation is also known as the double exponential equation.


The equation in Figure 1 will plot as a straight line on Gumbel largest probability paper.  Conversion of Figure 1 into a straight line equation is shown in Figure 2.



Peak flow rates for the USGS stream flow station on Buffalo Bayou adjacent to downtown Houston, Texas is identified as station 08074000  It is important to identify the measuring station on a physical map as the verbal description and the location are sometimes widely separated.  This stream gage station carries some details about location and other interesting information:

            Harris County, Texas

            Hydrologic Unit Code 12040104

            Latitude 29o45’36”, Longitude 95o24’30” NAD27

            Drainage area 358 square miles (or 93,100 hectare)

            Contributing drainage area 358 square miles

            Gage datum –1.36 feet above sea level NGVD29

Peak stream flow along with estimates for missing gage heights plus updates for details not shown beyond 1998 are shown in Table 1:


Table 1: Annual Flood Data From USGS Station 08074000 – Houston, TX


The right hand side of Table 1 contains flood data in rank order.  Notice the June 9 data (assuming it will be the greatest value for 2001—but it could be displaced by heavier rainfall during hurricane season).


Given the data in Table 1, what is the expected height of water at the gage station for the 100-year flood based on the Gumbel largest value distribution as shown in Figure 3? 


The flood data in Table 1 is scalar data and we need data pairs (X-Y) to plot on a probability plot. Benard’s median rank equation = (i – 0.3)/(N + 0.4) where “i” is the rank and “N” is the number of data points.  So the ranked data point 8.84 has an “I” value = 1 and “N” = 67 data points without any censored data.  Thus the X-Y coordinate is (8.84, 1.0385%). 


When all scalar points are paired with the neat statistical device of Bernard’s median rank equation we get Figure 3 which is a probability plot of the flood data for Buffalo Bayou adjacent to downtown Houston using 67 data points (i.e., 67 years of experience).  Notice how the upper right hand corner of the graph magnifies the percentage of occurrence compared to the lower left hand corner—of course, you’re more concerned about the big floods than the miniscule floods so it the technology works to your advantage.


Figure 3: Forecast Of The 100 Year Flood For USGS Station 08074000 – Houston, TX



Here’s how you read Figure 3 using the trendline (i.e., 50% confidence):

              3.59% of all floods will reach up to a maximum gage height of 10 feet

            55.44% of all floods will reach up to a maximum gage height of 20 feet

            88.87% of all floods will reach up to a maximum gage height of 30 feet (the flood of 2001 was 34.5 feet, flood of 1998 was 36.94 feet)

            97.80% of all floods will reach up to a maximum gage height of 40 feet (the flood of 1929 was 43.5 feet, the flood of 1935 was 49 feet)

            99.58% of all floods will reach up to a maximum gage height of 50 feet

            99.92% of all floods will reach up to a maximum gage height of 60 feet

            99.99% of all floods will reach up to a maximum gage height of 70 feet


To make the percentage values speak, you’ve got to convert the gage heights into money lost.  This requires a hypothesis.  Assume the relationship between gage heights and money lost will fit a semi-log plot where the money lost for a 20-foot flood is US$1,000,000, for a 34.5-foot flood the cost is US$2,000,000,000 and for a 50-foot flood the cost is $5billion.  The relationship is shown in Figure 4.


Figure 4:  Assumed Flood Cost For High Water Levels


The costs for a flood and the probability of exceeding the gage height with $Risk = (probability of occurrence of exceeding the event)*($Consequence For The Event) is:

            20 feet = US$1 @ 44.56% is ~US$1

            30 feet = US$1.29 E+09 @ 11.1% is $US143,319,000

            40 feet = US$2.97E+09 @ 2.2% is US$65,340,000

            50 feet = US$5.00E+09 @ 0.42% is US$21,000,000

            60 feet = US$7.32E+09 @ 0.08% is US$5,856,000

So Houston got zapped near the peak exposure.


The amount of money you can afford to spend to correct a problem of this magnitude is rather large but you can’t afford to cover all risk.


Since we have limited data, how do we forecast the data point that would representing the condition for the ranked 100 year data?.


Gumbel provided a mechanism called the “return period” to answer questions about where would the 100-year floods be positioned on the probability plot.  The return period equation is RP = 1/(1-p) where p is the cumulative probability. 


If the return period is 100, then p = 99%.  To estimate the 100 year flood, find the X value where the Y value = 99%.  For our data set, the answer is 44.76 feet as the forecasted gage height.  In a flat city like Houston, it’s not easy to gain this height by natural contours unless you build high and dry which of course cost money—duh, like you didn’t have that already figured out.


How do you “square” the projected 44.76-foot 100-year gage height against the recorded 49.0 data?  First, the 49.0 ft data point was recorded before the gaging station was in place, so we probably have errors.  Secondly, if we apply right hand confidence limits to the data at say 90%, the right hand confidence limit says the 44.76 number could be 48.6 feet with a 5% chance the number is bigger (by the way if you want more confidence and less chance for error, the confidence zone gets wider!!).


The value of the return period has an interesting interpretation as the sample size of additional observations required have a 50% chance of observing another value equal to or larger than the value of x on the trend line.  The most probable largest floods up to a certain time T increases as linear function of the logarithm of time per a direct quote from Gumbel.


What’s the gage height expected for a 250-year flood? 250 = 1/(1-p) where p = 99.6% which on the trend line is 50.27 feet or 54.8 feet with 90% right hand confidence.  At the rate Houston is growing, in 250 years the entire city will be one massive concrete jungle and water flow rates will increase thus driving the rain runoff to accelerate with increasingly high floods.


In short, flooding has been bad in Houston and it’s going to get worse unless we apply new technology.  The time sequence of flood data at this gaging station shows an increasing trendline toward greater floods because of the concrete jungle.  When the Houston population has had enough misery, they’ll stop building in flood prone areas and insist on better flood control mechanisms---pain has a strange way of getting people to respond in a responsible manner.


Flow dynamics of rainfall and floods are important items to consider.  The probability plot shown above is not the only consideration for building programs for roads, buildings, dams, and flood control system.  Prudence says build tall, build strong, add freeboard to stay above the waves, and expect mother nature will always have big, expensive surprises for you in the future—no guarantees!  Another thought to consider:  If you’re going to do every thing on the basis of cheap first cost, (building in the flood plain, building without adequate safety factors for mother nature) you can expect your life cycle costs to generally be higher than doing it right the first time.


Local flood conditions again illustrate the value of technology tools (Gumbel probability plots) to anticipate what’s going to happened and get prepared.  Unfortunately, forewarned does not result in altering the course of humans to continue to build low and wonder why their houses get repeatedly flooded.


Related Gumbel distributions:

See the Gumbel smallest value distribution for another example of how to use extremes in data.  The smallest value example shows how to use a minimum data point for each component to make important decisions.  Return to top of page.


Refer to the caveats on the Problem Of The Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by     clicking here.

Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!

You can download a PDF of this problem of the month.

Last revised 4/2/2009
© Barringer & Associates, Inc. 2001

Return to Barringer & Associates, Inc. homepage