Problem Of The Month June 2001
Flood Data and Gumbel Largest Distributions
Floods are a reliability problem. When you’ve got floodwater inside your house, you’ve got a big time failure!
Reliability successes always terminate in a failure—it’s not WILL they terminate, it’s WHEN the failure will occur? Nothing functions forever without a failure (Want an illustration about the lack of forever?—Where are the seven wonders of the ancient world?).
As you can see from the flood photos below, you can also ask
Ask the 25,000 homeowners in the
The reason for the large amount of rain is because of a
tropical depression, which formed in a single day over the
Daily stream flow data with gage depth heights for various
waterways in the
You can use the flood data and WinSMITH Weibull software to make predictions about future water heights to decide if you need flood insurance. The Gumbel largest (or upper) distribution is the tool of choice for floods.
The Gumbel largest value cumulative distribution is expressed in Figure 1 for the cumulative distribution function (CDF). The equation is generically similar to equations for the largest extreme value listed in Table 2 (page 24) of ”Statistical Theory Of Extreme Values And Some Practical Applications—A series Of Lectures by Emil J. Gumbel”., National Bureau of Standards Applied Mathematics Series 33 Issued February 12, 1954 (Order PB175818 from the National Technical Information Service of the U.S. Department of Commerce).
Gumbel is best known for his probability distributions
concerning flood data described in his book: Statistics of Extremes, E. J. Gumbel, Columbia University Press,
This is a type 1 extreme value Gumbel Largest extreme cumulative distribution function. X is a variable, Ψ is a scale factor, and d is a shape factor. Small d’s give steep line slopes on Gumbel probability paper. This equation is also known as the double exponential equation.
The equation in Figure 1 will plot as a straight line on Gumbel largest probability paper. Conversion of Figure 1 into a straight line equation is shown in Figure 2.
Peak flow rates for the USGS
stream flow station on Buffalo Bayou adjacent to downtown
Hydrologic Unit Code 12040104
Latitude 29o45’36”, Longitude 95o24’30” NAD27
Drainage area 358 square miles (or 93,100 hectare)
Contributing drainage area 358 square miles
Gage datum –1.36 feet above sea level NGVD29
Peak stream flow along with estimates for missing gage heights plus updates for details not shown beyond 1998 are shown in Table 1:
Table 1: Annual Flood Data From USGS Station
The right hand side of Table 1 contains flood data in rank order. Notice the June 9 data (assuming it will be the greatest value for 2001—but it could be displaced by heavier rainfall during hurricane season).
Given the data in Table 1, what is the expected height of water at the gage station for the 100-year flood based on the Gumbel largest value distribution as shown in Figure 3?
The flood data in Table 1 is scalar data and we need data pairs (X-Y) to plot on a probability plot. Benard’s median rank equation = (i – 0.3)/(N + 0.4) where “i” is the rank and “N” is the number of data points. So the ranked data point 8.84 has an “I” value = 1 and “N” = 67 data points without any censored data. Thus the X-Y coordinate is (8.84, 1.0385%).
When all scalar points are paired with the neat statistical
device of Bernard’s median rank equation we get Figure 3 which is a probability
plot of the flood data for Buffalo Bayou adjacent to downtown
Figure 3: Forecast Of The 100 Year Flood For USGS
Station 08074000 –
Here’s how you read Figure 3 using the trendline (i.e., 50% confidence):
3.59% of all floods will reach up to a maximum gage height of 10 feet
55.44% of all floods will reach up to a maximum gage height of 20 feet
88.87% of all floods will reach up to a maximum gage height of 30 feet (the flood of 2001 was 34.5 feet, flood of 1998 was 36.94 feet)
97.80% of all floods will reach up to a maximum gage height of 40 feet (the flood of 1929 was 43.5 feet, the flood of 1935 was 49 feet)
99.58% of all floods will reach up to a maximum gage height of 50 feet
99.92% of all floods will reach up to a maximum gage height of 60 feet
99.99% of all floods will reach up to a maximum gage height of 70 feet
To make the percentage values speak, you’ve got to convert the gage heights into money lost. This requires a hypothesis. Assume the relationship between gage heights and money lost will fit a semi-log plot where the money lost for a 20-foot flood is US$1,000,000, for a 34.5-foot flood the cost is US$2,000,000,000 and for a 50-foot flood the cost is $5billion. The relationship is shown in Figure 4.
Figure 4: Assumed Flood Cost For High Water Levels
The costs for a flood and the probability of exceeding the gage height with $Risk = (probability of occurrence of exceeding the event)*($Consequence For The Event) is:
20 feet = US$1 @ 44.56% is ~US$1
30 feet = US$1.29 E+09 @ 11.1% is $US143,319,000
40 feet = US$2.97E+09 @ 2.2% is US$65,340,000
50 feet = US$5.00E+09 @ 0.42% is US$21,000,000
60 feet = US$7.32E+09 @ 0.08% is US$5,856,000
So Houston got zapped near the peak exposure.
The amount of money you can afford to spend to correct a problem of this magnitude is rather large but you can’t afford to cover all risk.
Since we have limited data, how do we forecast the data point that would representing the condition for the ranked 100 year data?.
Gumbel provided a mechanism called the “return period” to answer questions about where would the 100-year floods be positioned on the probability plot. The return period equation is RP = 1/(1-p) where p is the cumulative probability.
If the return period is 100, then p = 99%. To estimate the 100 year flood, find the X value where the Y value = 99%. For our data set, the answer is 44.76 feet as the forecasted gage height. In a flat city like Houston, it’s not easy to gain this height by natural contours unless you build high and dry which of course cost money—duh, like you didn’t have that already figured out.
How do you “square” the projected 44.76-foot 100-year gage height against the recorded 49.0 data? First, the 49.0 ft data point was recorded before the gaging station was in place, so we probably have errors. Secondly, if we apply right hand confidence limits to the data at say 90%, the right hand confidence limit says the 44.76 number could be 48.6 feet with a 5% chance the number is bigger (by the way if you want more confidence and less chance for error, the confidence zone gets wider!!).
The value of the return period has an interesting interpretation as the sample size of additional observations required have a 50% chance of observing another value equal to or larger than the value of x on the trend line. The most probable largest floods up to a certain time T increases as linear function of the logarithm of time per a direct quote from Gumbel.
What’s the gage height expected for a 250-year flood? 250 = 1/(1-p) where p = 99.6% which on the trend line is 50.27 feet or 54.8 feet with 90% right hand confidence. At the rate Houston is growing, in 250 years the entire city will be one massive concrete jungle and water flow rates will increase thus driving the rain runoff to accelerate with increasingly high floods.
In short, flooding has been bad in Houston and it’s going to get worse unless we apply new technology. The time sequence of flood data at this gaging station shows an increasing trendline toward greater floods because of the concrete jungle. When the Houston population has had enough misery, they’ll stop building in flood prone areas and insist on better flood control mechanisms---pain has a strange way of getting people to respond in a responsible manner.
Flow dynamics of rainfall and floods are important items to consider. The probability plot shown above is not the only consideration for building programs for roads, buildings, dams, and flood control system. Prudence says build tall, build strong, add freeboard to stay above the waves, and expect mother nature will always have big, expensive surprises for you in the future—no guarantees! Another thought to consider: If you’re going to do every thing on the basis of cheap first cost, (building in the flood plain, building without adequate safety factors for mother nature) you can expect your life cycle costs to generally be higher than doing it right the first time.
Local flood conditions again illustrate the value of technology tools (Gumbel probability plots) to anticipate what’s going to happened and get prepared. Unfortunately, forewarned does not result in altering the course of humans to continue to build low and wonder why their houses get repeatedly flooded.
Related Gumbel distributions:
See the Gumbel smallest value distribution for another example of how to use extremes in data. The smallest value example shows how to use a minimum data point for each component to make important decisions. Return to top of page.
Refer to the caveats on the Problem Of The Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by clicking here.
Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!
You can download a PDF of this problem of the month.