Problem Of The Month
December 2003---Make Repairs On Overtime?
For redundant equipment should I make repairs on straight time to defer costs? Should I shorten the repair time by use of overtime with the extra cost? This is often an emotional question and the facts below will help you made the decision based on the merits of the case rather than the emotion.
Download this problem of the month as a PDF file.
We have two devices working in parallel configuration on a continuously operating system. Both devices have accumulated the same number of operating hours, and we need one to survive.
We have just recorded a failure on one device. Now, we no longer have a redundant system.
Should we repair the dead device on straight time or overtime? How do we make a rational decision based on the facts? What facts do we need to make the computation?
We will keep the problem simple using the exponential distribution of failure where the chance for failure is constant.
Here are the facts for the simple case:
1. Assume we have perfect switching between devices.
2. Each device has a mean time to failure of 3 years which is the same as 26,280 hours/failure, and thus the failure rate is 1/26280 = 0.000038052 failures/hour.
3. The failure has occurred at 20,000 hours.
4. The time to make repairs is 20 hours. The delay time for corrective maintenance on overtime is 10 hours and without overtime the delay is 400 hours. This means the unit will be restored to the up condition in 30 hours for overtime and 420 hours for non-overtime conditions.
5. The repair cost is $1000 for non-overtime and $2000 for overtime.
6. If the surviving operating unit fails during the repair/delay period, the costs for lost gross margin lost is $30,000.
7. The risk equation is $Risk = POF*$Consequence where POF = probability of failure and $Consequence = $30,000.
Calculations For The Simple Case:
Step 1: The probability for surviving to 20,000 hours is:
R(t) = = e-0.000038052*20000
= e-0.76104 = 0.46718.
This says the probability the device will survive 20,000 hours is 46.718%.
Step 2: The probability for surviving to 20,030 hours under conditions of overtime is:
R(t) = = e-0.000038052*20030
= e-0.76218156 = 0.46665.
This says the probability the device will survive 20,030 hours is 46.665%.
Step 3: The probability of failure between 20,000 hours and 20,030 hours under overtime repairs is:
to 20,030 = 0.46718 – 0.46665 = 0.00053
This says the probability of failure of the surviving device during the overtime repair period is 0.053%
Step 4: Given the redundant device has survived to 20,000, what is the conditional probability of failure during the next 30 hours.
the unit has survived 20,000 hours what are chances for failure during the next
30 hours = (0.46718 – 0.46665)/0.46718 = 0.00053/0.46718 = 0.00113
This says given the device has survived to age 20,000 hours, the conditional probability of failure of the surviving device during the next 30 hours of overtime repair period is 0.113%
Step 5: How much money is at risk for failure during the 30 hour overtime period
$Risk = 0.00113*$30,000 = $33.90
Step 6: The probability of failure at 20,420 hours under the normal repair interval is:
R(t) = = e-0.000038052*20420
= e-0.77702184 = 0.45977
This says the probability the device will survive 20,420 hours is 45.977%.
Step 7: The probability of failure between 20,000 hours and 20,420 hours under the normal repair interval is:
to 20,030 = 0.46718 – 0.45977 = 0.00741
This says the probability of failure of the surviving device during the normal repair period is 0.741% which is much greater chance for failure than if the repair is completed quickly with overtime.
Step 8: Given the redundant device has survived to 20,000, the conditional probability of failure during the next 420 hours for normal repairs is:
the unit has survived 20,000 what are chances for failure during the next 420
hours = (0.46718 – 0.45977)/0.46718 = 0.00741/0.46718 = 0.01586
This says given the device has survived to age 20,000 hours, the conditional probability of failure of the surviving device during the next 420 hours of normal repair period is 1.586%
Step 9: How much money is at risk for failure during the 30 hour overtime period? It’s important to convert the issues to money using the $Risk = POF*$Consequence.
exposure during the 30 hour overtime repair (see step 4) is 0.00113*$30,000 =
The $Risk for routine repairs during the 420 hour interval (see step 8) is 0.01586*$30,000 = $475.80.
Step 10: Would you spend $1,000 extra for overtime repair to reduce your risk $475.80 - $33.90 = $441.90? The answer is no. By the way, the numbers can turn out to be much different if the normal repair times are too long (for example, if elapsed repair times [repair + delay times] for this example exceed 1020 hours, then overtime repairs are beneficial).
By the way, if you repeat these calculations in Excel, you will get slightly different numbers than with these hand calculations as a result of the use of more significant digits than shown above in the rounded versions which were worked out with a hand calculator to show the steps for finding the values.
The bottom line:
You’ve got to know how things live and die along with the costs for repairs and the consequences if two units fail at the same time. You can use this step-by-step example to make it more accurate (but also more complicated) by introducing Weibull analysis and more complicated cost profiles. Make your decisions rationally rather than emotionally—this requires the facts.
Refer to the caveats on the Problem Of The Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by clicking here. Return to top of page.
Thanks to El Hadi, Development Manager,