How Many Heat Exchanger Tubes
Should I Inspect?

We have a heat exchanger.  It has been in service for a period of operation.  How many tubes should I inspect to estimate the number of failures (tubes that I must plug to remove the leaks from service)?  The question involves a sampling issue to gain maximum information to keep inspection costs low.

The short, simple answer for sampling is to gain information:
              If many failures exist, inspect only a few tubes because they’re easy to find. 
              If few failures exist, inspect many tubes because they’re very difficult to find. 
              If the problems are homogeneous and randomly scattered through the bundle, choose samples for inspection by use of random numbers
              If the problems are stratified/local/batch, choose the sample by use of random numbers from the population that might be afflicted.
Of course you’ll know the maximum number of failures that could exist—it’s the maximum number of tubes in the tube bundle for a heat exchanger as the number of failed tubes cannot exceed the number of tubes.  Making wise sampling decisions requires knowledge of the affliction which allows a Bayesian approach to gain maximum information. 

Why the emphasis on random sampling?—random samples have been shown to be statistically valid whereas samples taken any other way are not.  How do you decide on which tubes to inspect using random numbers?
              Number all the tubes which will also define the population
              Decide how many tubes you want to sample
              Download the Excel spreadsheet tab labeled Get Random Number and inspect the tube numbers listed.

Be careful about inspecting a fixed percentage of tubes:
              The size of the sample is more important than the percentage of the lot.
Sampling 500 tubes (10%) from a tube heat exchanger that contains 5000 tubes contains more useful information than sampling 10 tubes (10%) from a 100 tube heat exchanger.  Sampling is a way to evaluate a small portion of a population to infer useful information about the population with savings for time and money.

Give consideration to the inspection level.  This sets the stringent acceptance criteria:
              If failure costs are low and contamination is a non-issue, use less stringent inspection.
              If failure costs are high and contamination involves lethal conditions, use very stringent inspection.
Use common sense—one inspection method does not fit all conditions.

Consider this data in Table 1 where every tube has been inspected at least once:

Table 1 shows tube failures are rapidly increasing with age.  Mean time to failure is “falling like a rock” as shown in the last column!  Note that 89/118 = 75% of the failures are due to blockage.

Generally when heat exchangers have lost 10% of the tubes they become ineffective and are ready for replacement as this becomes an economic problem.  Given the data in Table 1 at 2799 days and 3225 days, could we have predicted when failure would occur with the cumulative loss of 10% of the tubes (i.e., 100 tubes or 43 additional tubes)?

We have mixed failure modes (blockage and thinning walls) which are competing to kill the tubes.  We have coarse data as we only inspect at intervals where we discover failures.  This means our data has some deficiencies.   How much more time will elapse until the loss of 57 tubes has grown 100 tubes (given we have 943 survivors at age 3225 days)?   The forecast at 3225 days will be measured against the facts in-hand at day 3500.

To make a Weibull plot of all the failure data at age 3225 days, we can use a Weibull probit format.  Probit is used because it regresses Y on to X as uncertainty is greater in the Y directions.  Why the uncertainty?--we may not find pits or small cracks using less discriminating (but cheap) eddy current inspection.  IRIS inspection (expensive) would be the preferred flaw detection method as the rotating ultrasonic head can detect smaller defects which would reduce the Y-axis uncertainty.  IRIS inspection also requires very clean surfaces, which is unlikely to occur where deposits on the tube wall are causing blockage in the tube. 

The standard Weibull analysis format using the inspection option would be appropriate if we had reasonable ability to find the defects with better inspection methods.   Also note that the aggregate failures come from mixed failure modes. 

Use WinSMITH Weibull (you can download a demo version and import the ZIPPED files for each figure which will allow the demo version to function with fidelity—if you type in the data into the demo version it will slightly randomize your input information).  Under the methods icon (bottom row, third from the left) select the probit method (bottom row, 5th from the right).  Enter your data in the format of Probit(2) as time datum * cum quantity affected * quantity sampled which would result in a data entry that looks like this:


Note the quantity sample is fixed at 1000 not the 975 inspected at this date of 3225 days.

The data gives a mixed failure Weibull of eta = 5200 with beta=5.934 (on heat exchangers, we typically see betas between 3 and 13 although exceptions can be expected—thus the beta=5.934 seems reasonable). We’ll used this to make a forecast for when the next 43 tubes will fail following inspection #2 (remember we already have inspection #3 in the bag so we can judge how well the preliminary data is for our fearless forecast---also remember tube life is falling like a rock).  Figure 1 shows the Weibull plot after the second inspection:

Two methods are available for predicting the time at which 10% of the tubes will fail.
            1.  Enter the predict feature of WinSMITH Weibull (top row of icons, 4th from the right), click on the line trend line for beta = 5.935, enter the % Occurrence (for 10% = 57 actual failures + 43 future failures) and find the predicted time for 10% of the population to fail is 3559 days.
            2.  Use the Abernethy risk forecast (top row of icons, 3rd from the right) which requires some elaboration because it’s more complex than needed for this example but useful for many complex situations. 

For the Abernethy risk forecast, put down a clean sheet on WinSMITH Weibull, top row of icons on the left.  Reset WinSMITH Weibull to the standard method (under the Methods icon, top row, third from the left).  In the data sheet, input the survivors at a censored age of
            -3225 days *943 survivors
as all future failures will come from these survivors.  Under the Abernethy risk forecast (top row of icons, 3rd from the right), input the Weibull trend line as eta=5200 and beta=5.934 in Option D.  Set the forecast horizon to 60 weeks in Option E by choosing submenu option F.  Allow no renewal replacements in Option R.  Set the usage rate in option U via Option D to get the time in days.  Then click on the green check mark for the analysis.

You will see the forecast of 43 more failures will occur between 47 and 48 weeks as shown in Figure 2.  By interpolation the data in Figure 2, expect 43 failures will occur in 47.6 weeks = 333 days into the future where cum time is forecasted to be 3225+333=3558 days (essentially the same as found from the first method of simply predicting 3359 days from the trend line of two data points).  Remember this two data point forecast involves extrapolation beyond the acquired data with only two data points.  The actual data at cum time 3500 days showed 118 failures detected which says the forecast has under predicted where the 43rd future failure would occur making the total of 100 failures.

The Weibull plot of the third inspection interval gives this data set:
Figure 3 shows the plot with 10% failures forecasted to occur on day 3449 from the three data points.  This data now involves interpolation along the line rather than extrapolation beyond the data as occurred with only two data points:

Notice the trend lines are getting steeper and eta is moving to shorter life which emphasizes the accelerating problem of shortening MTTF expressed above as “falling like a rock”. 

So how do you overcome the small data sets to make better forecasts?  We need a Weibull library of failure data with well defined betas.  Suppose our library of past experience knew the beta should be 7.029.  Now using the data from Figure 1 and imposing the beta = 7.029 we can make a Weibayes forecast to get the Weibull plot shown in Figure 4 (notice all trend lines have the same slope with beta = 7.029).   Compare the results for where 10% of the tubes would fail:
            1) Two data point forecast for Figure 1 = 3558 days (extrapolation of trend line)
            2) Three data point forecast for Figure 3 = 3449 days (interpolation of trend line)
            3) Two data point + Weibull library forecast for Figure 4 = 3466 days (extrapolation of trend line)
Having a valid Weibull library is an important engineering tool for making important decisions—this is how you put the practice of engineering into a working tool with limited data.  If you don’t have a Weibull library, get one!

Out of 943 survivors at age 3225 days how many tubes would you have to inspect to find 43 expected failures?  This says the expected defective percentage is 43*100/943 = 4.56%.  Use Chapter 8 of The New Weibull Handbook for the binomial model. 

The binomial model is used for inspection outcomes where you expect to find items either good/bad, either/or, heads/tails, defective/not defective, (hence binomial) where the n-trials is large and p-probability of failure is small, and each trial-inspection is independent with random occurrences which means no memory from one inspection event to the next event.  Suppose we wanted to sample inspect the heat exchanger tubing allowing zero failures, how many tubes would we have to inspect?  Of course this requires us to ask what risk will we allow for getting the wrong answer—let’s assume we will allow a 1% chance for error. 

To find the number of tubes you must sample inspect, use the calculator icon of WinSMITH Weibull (which will function with fidelity on the demonstration version).  Click on the calculator icon, click on the Binomial option B, click on cumulative probability option C, click option P and set the probability for 4.56%, click on the allowed event quantity, option E and set to 0 (zero), increase the trial quantity option N until the cumulative probability in option C goes to below 1% risk of accepting the sample. 
              This requires inspecting 198 tubes with 0 defects discovered. 
              If you allow up to 1 defect to be discovered, you must inspect 253 to keep the risk below 1%.
With sample inspection you did nothing to change the underlying statistic expecting 4.56% defects (43 defects out of 943 surviving tubes).  If you bust the sample inspection, you must do more inspections to eliminate the defects.

Don’t be mislead that 100% inspection finds 100% of the problems!  Time and time test show 100% inspection is only ~80% effective.  This says:
              First inspection finds 80%*43 = 34 with 9 defects remaining.
              Second inspection finds 80%*9 = 7 with 2 defects remaining
              Third inspection finds 80%*2 = 2 with no defects remaining (you hope)
Of course in real life you never know exactly how many defects exist in a defective system so it’s hard to do the math up front unless you build your Weibull distributions to make a forecast or else you must go to heroic lengths to find the defects by multiple 100% inspections.  Since you’re unlikely to believe this entire paragraph, I have an inspection test for you to take. 

From the Excel worksheet, can choose your Number Test tab.  If you’re better at reading, you can choose a alphabetical test by clicking on the Alphabetical Test tab.  Both tests are for a 5 minute time interval (you never have enough time to complete most tasks).  Score your results before you look at the answers.  When you can press the F9 key a fresh test appears for your second trial since I’m sure you’ll be embarrassed that you did so poorly knowing that any idiot should be able to pass this simple little inspection test.  

What’s the purpose of the Excel file?  It exist so you can conduct your own test and score your own personal results to one again demonstrate that 100% inspection is not 100% effective (even though you believe it is but I doubt you’ll scored 100% in your own personal test).  Build your own results from actual test scores.

You can also download a ZIP file of authentic WinSMITH Weibull files so you can reproduce the graphs shown in Figures 1-4 using the demo version of WinSMITH Weibull.  Remember if you enter your own data into the demo spreadsheet it will randomize your data but the authentic files will be processed with fidelity.

Return to the list of problems by clicking here.

Refer to the caveats on the Problem Of The Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by     clicking here.   Return to the top of this problem.

Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!

You can download a PDF copy of this Problem Of The Month by clicking here.

Return to Barringer & Associates, Inc. homepage

Last revised 5/05/2004
© Barringer & Associates, Inc. 2004