Don't believe what you see; at least at first.
Many people go through a point in their lives where they question the beliefs they hold most dear. Then there are those who question the entire basis for statistical process control (SPC) once they have learned the statistical basis for them. I can’t help you with the former, but I have something to say to the latter after the break.
There are lots of ways to mess up control charts, but today I’d like to discuss a concern I hear from students and clients after they have learned some statistics and encounter one of these situations. Interestingly, although one example is with a very large sample size (for continuous data) and the other a fairly small one (for discrete data), the root of the problem is the same misunderstanding about control charts. In both cases, you might be tempted to either spend money trying to control a process only to find yourself losing more due to increased variation, or you might abandon control charts altogether as tools that don’t work for you.
Let me give you two scenarios.
You have a high-volume process that automatically samples 50 units for your control chart. You decide to use an x-bar and s chart because you know s is a better chart to use when n > 7. This is what you get:
|Figure 1 - A standard x-bar and s chart for n=50
Is your process really that out of control? If so, should you have been reacting and adjusting the process all that time for those out-of-control signals? If you tell your production group to do that, they won’t be able to make anything at all—they will be spending all their time making tiny adjustments to the process.
Dang it! I guess SPC doesn’t work at our business.
You have a low-volume process that only produces 100 units a day. This is a bleeding-edge process and you’re still doing product control as you learn more about the process variables. You only have a pass-fail test, so you’re using a p-chart. However, you have just had an SPC class and your instructor told you about how the control limits on the p-chart are based on approximately—because it’s discrete—the 99.73-percent confidence intervals around the average failure rate. Because you measure everything you produce, there should be no sample error in your estimate of what the fail rate is—you just measured the entire population, right? Because there’s no error in the estimate of your nonconformance rate, does this mean that any day that isn’t exactly what your overall average has been is a day that you’re out of control, as in the chart below? Time for some heads to roll, I guess.
|Figure 2 - A p-chart with limits +/- 0.5 units around the historical average
Well, obviously there’s something screwy with these control charts. I’ll give you a couple of hints to see if you can figure out what is behind both of these scenarios. First, let’s say that in Scenario 1, instead of measuring all 50, you had only measured five for each point. Let’s keep it as an x-bar and s chart, even though with n = 5 we would probably choose an x-bar and r chart. That way we’re comparing the same things. Using the exact same data, but with n = 5 randomly chosen from the 50:
|Figure 3 - A standard x-bar and s chart but using n = 5 from the 50
Wow! That looks pretty good. That’s your first clue.
The second hint is from Scenario 2. What population are we really tracking here? Is it the nonconformance rate we had today? What’s going on?
We first need to review the purpose of the control chart. Control charts are a powerful tool for identifying when a process is subject to common cause variability (random variability that’s common to the process) or the common and special cause variability (in addition to the usual process, some destabilizing effect is present). The whole reason for doing that is to avoid two errors. Statisticians call them alpha and beta, or producer’s and consumer’s risk. We can think of them as “reacting when I should not” and “not reacting when I should.”
Why avoid reacting when I should not? Well, if adjusting the process costs money, then we waste money that we didn’t need to spend. But, as Dr. W. Edwards Deming showed, what’s even more important is that adjusting a process that’s subject only to common cause variability will actually add to the process variability. A process only affected by common cause variability is in statistical control.
And of course we want to avoid not reacting when we should—if something is happening, we want to know about it as soon as possible. A process affected by special and common causes is out of statistical control; we can’t predict what’s going to happen next.
When you get right down to it, when to react or not is an economic question. When Walter Shewhart invented the control chart, he used the statistics as a heuristic (a cocktail-party word for a decision-making rule) for balancing these two errors. The reason that we use three standard errors (the 99.73-percent confidence interval) on most control charts is so that it takes a pretty clear difference before we adjust a process, so as to minimize reacting when we shouldn’t. At the same time, when we’re pretty sure there has been a change in the process, we know that we should investigate and react.
Alright, so if that is what we’re doing with control charts, why are they not working in our two scenarios?
Our purpose is to use the statistics as a heuristic for making economic decisions. We know that processes are affected by many small sources of variability all the time, and the process output today is probably not coming from exactly the same distribution as the process of yesterday. We model the process output as if it’s coming from an idealized distribution because it is useful to do so, but nothing really follows these distributions. We just need to know if it is different enough to economically justify an intervention in the process.
I once had a lot of historical data for two strands of steel being cast out of the same crucible. It turned out that there was a statistically significant difference between the two strands over a long period of time, but the difference was so tiny that it was completely unimportant.
Similarly, our control chart in Scenario 1 has a really large sample size, giving it the power to detect real, but minor shifts around the mean. The means for each sample are randomly shifting by 1.2 units up and down as are the standard deviations. So neither the means nor the standard deviations are the same with time. But these differences are extraordinarily small compared to the variability we see in the individuals (about +/-10 for the means), so no one cares. When we lowered the sample size to five our control limits got wider and the minor shifts in the true average that we detected before disappeared into the background, along with all the other unknown sources of variability common to the process. If that were the chart we had chosen, we wouldn’t even know that they are occurring.
Don’t get me wrong, these are real statistically significant differences. But remember, our purpose in doing the control chart is to give us a rule for when to intervene or leave the process alone. If we were to try to react to these small (compared to the within-sample) differences by adjusting the process, we would probably end up overreacting due to the limited resolution of our adjustments, and end up actually increasing the variability. We have a much bigger problem we need to tackle—quantifying, controlling, and reducing the variability of the individuals. Sounds like a job for a control chart, but what do I do?
This leads us to the second scenario: In this one, we’re not interested in the population of what we made today, but in the variation of the proportion nonconforming from now and into the future. So, measuring 100 percent of today’s production is only a sample of the population we’re interested in—our production through time. There’s error in our estimate of what the long-term proportion nonconforming is, and we should be using an interval estimate to decide if we should react or not. But what estimate?
The right charts
So what would be the best way to achieve Shewhart’s objective of reacting when we should and not reacting when we should not? Here’s what I would do.
In the first case, we’re detecting real, but relatively unimportant signals. If I try to adjust to those, I will only be making things worse. Obviously, the change in the population average we’re detecting here is part of the process as it stands. If I can eliminate these changes, then sure, I would do it. If not—which I think is more likely, as they’re small compared to the within-sample variation—then what heuristic can I use for making the decisions to adjust or not adjust the process? I suppose I could just take samples as we did in figure 3, but I don’t like throwing out data.
We have evidence that the drift in the mean between points is small compared to the common cause variation. The average within-sample standard deviation is around 2.8, whereas if I look at the standard deviation across everything I only have 2.9. So most of the variability is coming from within the 50 samples that go into each point. Analytically, we would do a one-way random effects ANOVA across the samples and decompose the variances into their sources and arrive at the same numbers. Because this is the case, it might be both common to the process and so small as to not be a concern right now. If the variability in the shifts in the mean is random around the overall mean, they just add a small component of variability to the means and I could treat these averages as if they were individual observations subject to common cause variability. And if that is true, I can put these statistics on an individual chart. To do this, I need to check to see if the averages and standard deviations are distributed normally. Because they are, I can go ahead and plot them as if they are individual data points. In the control chart below, I’m using the same averages and standard deviations as figure 1, but I’m calculating the control limits from the average moving range of the x-bars and s’s. Kind of like two individual charts stacked on each other:
Figure 4 - Control chart plotting the averages and standard
deviations as individuals using the average moving range
That looks like a process that’s predictable and stable. In other words, it is in control.
These control limits are based on the variation of the points in figure 4, which in turn are based on the variation within each sample of 50 (which is the dominant variation going on here) and the variation as the process average changes (a minor and less important source). I now have a way to decide to react if the process is running differently enough from how it has in the past, when the past contains data that are changing on average a little bit from sample to sample. We have found a way to meet Shewhart’s objective.
However, don’t forget that this is what you’re doing. Once you find ways to reduce the within-sample variation (perhaps using an experimental design) you will want to figure out why the average is shifting from sample to sample, and that source of variation will be hidden in the chart, losing the opportunity to get even better.
In Scenario 2, now that we understand the daily results were themselves samples, not the population, we just use the standard exact control limits for a p-chart with n = 100. We find that, contrary to having excessive variability, we actually had a time where we had a lot less than the expected variability around the average. Instead of beating ourselves up about how poorly we were doing by not making exactly 4 percent every day, we should have been investigating what we did between Day 7 and Day 22 to reduce the variability so that we can make it permanent.
|Figure 5 - Standard p-chart with exact limits
As opposed to demonstrating how SPC fails at our business, the two scenarios described above actually show us how SPC is a very useful heuristic for telling us when to appropriately react to signals from our processes. But to do that, we have to have an understanding of the purpose of control charts and the statistics behind them. The error in both scenarios was in misunderstanding the population we were really sampling from, though it manifests itself in two different ways.
This is not changing the calculations until you see what you want; it’s based on the sound premise of the purpose of control charts: A statistically-based way to make economical decisions about when to react and when to leave the process alone.
So if someone you know reaches that crisis and doubts the validity of SPC, remember your friendly neighborhood Heretic and guide them back.
Thanks to MVPPrograms for the use of MVPstats in writing this article. It’s really easy to change how the limits are generated. They have a shareware version if you want to try it out.