Alliance Reports

How To Do a Long-Term Study

We use measurement systems to make decisions every day, but how do you know if these measurements are correct? Even a calibration sticker only tells you if it was right in the past. How do you know it is right today? In the video, I show how to set up a long-term study and how to use the data you generate to determine what is going on with your measurement device, what that means for your process, and how you might make it better.


Hi this is Steve Ouellette,Ppresident of the ROI Alliance, and I thought I'd spend today showing you how the free statistical analysis package that we have called ROIstat does a long-term measurement system analysis.

So what's a long-term measurement system analysis or MSA? A long-term MSA tracks that measurement device and system over time to make sure that it hasn't changed from the time that you last tested it. For example, a calibration sticker was a point in time that it was calibrated to a particular standard, but that doesn't say anything about what's happening today. The same thing with a potential study or gaug R&R study which is a snapshot in time, but that could have been months ago maybe even years ago. So has anything in the interm may have changed, because when you're measuring something and something in the measurement system changes, it looks like something with the product or the process has changed so you react to it that way.

Every critical characteristic that you have and every critical process parameter that you have should have on it a long-term measurement system analysis to make sure that you're reacting to something real not a change in the measurement device itself.

These long-term measurement system analyses track control or stability through time, making sure that it detects any drifts or changes. It also tracks variation with time both the variation that you've experienced overall and then from measurement device to measurement device, as well as any changes through the magnitude of what you're measuring. That's called uniformity. It can also track bias and linearity. Bias is the difference from the true value and linearity is the change in bias with magnitude. If can test these if you know the true value for your test parts now. You may not know that, and you may have to assess that separately.

Okay so let's go through case study. Of course this case study, because we're trying to make it interesting, has lots of problems with it. It's kind of boring to go through one that doesn't!

This is a final measurement of a thickness of a plastic sheet. It's an automated measurement device and you've got two of them. You've been using them for a while in thinner material, but management wants to try to measure the thicker material with these automated devices as well. The vendor tells you it's plus or minus 0.5 millimeters.

Okay so how do you do a long-term study? Well in this case we're going to take eight samples, and we're going to use eight so that we can use a standard deviation, and we're going to measure them 25 times each. That could be once a day, could be three times a day, could be five times a day, whatever makes sense for you. After we generate all that data, then we're going to do the analysis.

I'll show you how the samples are from a traditional, typically measured product from 52 to 125 mm thick, kind of spanning the range that you'd expect the gauge to be used. Now in this case the true values are known. We've taken these eight sample sheets and we've measured them with some sort of a process that's traceable back to NIST, and so we know the exact thicknesses for each of these different samples, and that will allow us to test for both bias and linearity.

The specification for these for each nominal thickness is plus orus 0.5 millimeters, so the width of the spec for all the different sheets is 1 mm and from previous data we know that the standard deviation of the process itself, of the manufacturing process itself, is17.

Okay, so let's take a look at the data and you can see I have it already highlighted here. So we've got the operator in this case is two different automated machines, the ID is just the identification of the particular part itself, which we hide from the operator, but in this case they're machine so we just randomize them. The set is the observation, so I do one maybe one day to the next day. Then the actual measurements themselves, and up here I've got the true values associated with their particular ID numbers.

Okay so we're going to copy that over and we're going to go over to ROIstat and just paste that in like that.

We'll click over to MSA. Make sure that we've got all the right columns in the right dropdown lists. This is going to be a long-term study, so we're going to click on long-term study. We do have true values, so we're going to go ahead and check linearity and bias and assign which of the columns has the true values. By default when you select a long-term, it's going to select the standard deviation because you're going to want at least eight parts that you're measuring through time.

Now keep in mind you are going to maintain these parts over time, so you want to make sure that you preserve them past this study because the whole benefit of a long-term study is you can keep coming back to it at some periodicity, say once a week, once a day, whatever makes sense, so that you can continue to validate that the measurement system is performing like it did the first time that you that you used it.

Okay and the spec range was 1 mm, and whatever our overall average was, we know that the standard deviation was 0.17.

Okay, so let's take a look. We can already see looking at the summary results that we're going to have a problem. My overall variability of the measurement system itself is 261% of the width of the spec, so I already know I've got an issue here. I'm going to try and figure out maybe what that issue is.

Ideally you'd like to see that ratio to be relatively low. You want to take your spec and be able to slice it up into chunks so that you can easily tell if it's in or out of spec. We'll see what the impact is on our ability to classify it as in or out of spec when we get to the end of the analysis. Also, I want to be able to take the variability associated with my process and chop that into pieces as well that allows me to easily detect changes in the process itself, or if I'm doing some continuous improvement activity that allows me an easier ability to see if there's been an improvement in in the process.

Okay, so we already know we've got an issue here. Let's take a look and see what's going on and where that issue might be coming from.

Now because we selected different thicknesses on purpose, because we're going to try and exercise the gauge to go across the entire range of thicknesses across the range we're going to be measuring. Since these are all on the same axis, you can't really see very much so you just uncheck match y-axis so it'll take each of those axes independently and generate those.

All right so let's take a look at our uh control charts. So what Roistat does is it generates a control chart, an individuals moving range chart, for each part over time. So this part is the same part, you know same unit, and it goes up it goes down but is it stable through time? That's going to be our first criterion, and we look at this and we see part one looks pretty stable. It goes up goes down but you know it's not out of control. Part two, part three, part four, part five, six, seven.

Appraiser one part 8 does have alternating values, so you might want to investigate this because when you see this in the real world, this is often, particularly for an automated gauge, some sort of an internal recalibration that's happening too quickly or the dead band around that recalibration is set too tight and so it'll take a measurement, then recalibrate, take a measurement, recalibrate. But each time it's recalibrating, it's recalibrating based on just normal random variability, and so it's actually moving the average up and down in making a pattern like this, so you might want to investigate that and see what's unique to part number eight on appraiser number one. Maybe there's something set wrong on that machine.

We go over to appraiser 2 and we look at the parts. We get to part five and we see something going on there. We see a big drop, a big sudden drop, and the upside down triangle indicates is that there is a moving range violation and a point outside the expected variability. So something happened there that we would investigate. Again it could be random noise, but I don't want to necessarily bet that way. I want to see if the part got damaged or the machine itself got reset or something like that.

We also see a bit of a run at the beginning here, that's eight points above the mean, again not expected to see very frequently, but keep in mind we're generating a lot of control charts and we aren't controlling for any of that, and so we do actually expect to see some out of control conditions that are just false signals. We're going to investigate them and make sure that there's nothing kind of strange going on here.

Okay, we see a couple of other moving range violations here, big jump up here big jump down there. We'd investigate those, but overall I'm not really seeing a lot of out-of-control conditions, indicating that the gauge is changing a lot with time. So I would say provisionally based on our investigations of what we're going to find out in those other areas, provisionally it looks reasonably in control.

So let's proceed with the analysis. Now the next bit here is going to be looking at the same numbers, but now I'm going to be taking an average. I'm averaging across all eight of those samples that I keep going back and measuring, and so this is my average of all my eight samples in the first observation, second observation, et cetera.

Now I would expect whatever those the values are for the real values of those things, that the average ought to be the same time after time. It's the same eight parts that I'm taking averages for. It's a little different because of measurement variability and measurement error, and so there is point to point variability, but it looks pretty stable.

This is the average of all those eight points on the first machine and the reason we do this is if there's a subtle change, like a subtle shift in the average for all the parts, you wouldn't necessarily see them on the individual's charts. But on the average chart, you probably would be able to see something, and so it just kind of highlights a certain more subtle difference that you wouldn't necessarily see on the individual charts themselves.

We do the exact same thing with the standard deviation too. Whatever the standard deviation is for those parts, it should be the same time and time again. They different parts and they've got a known amount of standard deviation, but they do vary and so is that varying in the standard deviation itself predictable? And it's it looks pretty stable as well.

Now keep in mind that neither of these are traditional control charts. We're calculating the control limits there based on the moving range, so these are like individual and moving range charts because the average that we have of these parts is they're not all the same parts. There's a known average, so even if my measurement device was perfect I would have some number that would be the average even if I had zero measurement error across all these different size parts. I still have a standard deviation, so I'm treating them as individuals charts and calculating limits that way. Same thing with my appraiser number two, for the mean and standard deviation.

So really, this measurement system looks like it's pretty stable with time. So the excess amount of variability that we're seeing compared to the spec is not coming from big shifts in individual parts, or big changes in the overall measurement itself. It's pretty stable.

So let's continue on and see if we can figure out what's going on. The next one is dispersion within part, and so this part number one was measured 25 different times, part number two was also measured 25 different times. Looking at the standard deviation of all those times, we remeasured all those parts, is there anything that shows maybe one part had more variation than another part? Or maybe a group of parts had higher or lower variation than the others?

We don't see that, and so what we see is all these parts, all these standard deviations for each of the parts, is pretty predictable, right? It was within my usual calculated s-chart control limits, and so it doesn't seem to be that, and it seems to be that way across both the two appraisers. Whatever the variance is or whatever the standard deviation is for all these parts seems to be pretty predictable, pretty much the same. I don't really see anything standing out.

All right, so then we scroll down to the bias analysis. Now because we have the true value, we can compare the true value back to what we're getting for the average of each of these different parts.

I'm going to be looking at two things here. One is what's called linearity. Linearity is looking at the change in bias depending on the magnitude of what I'm measuring. So if something's thinner, maybe it's got less of a bias, and as it gets thicker maybe it's got more of a bias.

That's exactly what we see here. So if I look at this model one, bias equals that formula that's indicating in this p-value tests right here is it significant. There a significant relationship between the magnitude that I'm measuring and the bias off the true value. That's exactly what we see, and so we see this line here indicating that as my true value goes up, my bias increases. So that's the measure minus the true value, and so the measure gets bigger. It starts below in this case for appraiser one, then gets to pretty close to zero bias, and then continues above the bias of zero as I get thicker and thicker.

So already I know that I've got something I need to correct for. How do I correct for that? Well it turns out bias and linearity corrections are pretty easy actually. If I had an overall bias, let's say a millimeter, I just subtract a millimeter from everything.

It's a little more complicated with linearity, but ROIstats generates the formula for you. If you have an automated device like this, sometimes you can go in and actually put in a linearity correction formula, and that's what this would be right here based on what we've measured so far. The true value is going to equal that formula, and so that would then take that tilted line and make it flat and centered on that bias equals zero. There's still variability around there. We get rid of the overall bias off of what the true value is.

I see a different number for for the second machine. It's shifted a little bit higher, so it starts off a little high and then it gets higher as it get thicker in what I'm trying to measure. Again, I can just put that back into some sort of a correction factor.

In these automated ones, that's often pretty easy to do. If you're doing this by hand, of course that's an additional step for your operators and you'd have to consider how difficult that's going to be for them to do on an ongoing basis before you determine if this is going to be workable for you.

Okay but this doesn't this doesn't even affect the percent R&R calculation. This is just determining that there is in fact a difference from the true value, and we can correct for that, but we still haven't seen why we have additional variability above that so large compared to our specification.

And here is where we get the first hint of that. Uniformity, as I said before, is how much variation you have associated with your measurement. It is related to the thickness in this case, the magnitude of what you're measuring. And it turns out in this case, it is. So both of these tests indicate significant effects, and you can see that as I go up in what I'm measuring in terms of the thickness of the millimeters, my standard deviation also goes up.

Now this didn't show up in any of the previous charts, it's exactly why we test this, this is a way of testing to see if there's uniformity in the dispersion regardless of what it is you're measuring. Turns out in this case, it's not and so that as my thickness increases I get more and more and more standard deviation, more and more variability. So that's kind of one of the sources that I'm actually looking at. Why is it happening that way? All right we'll come back to that in a second.

Now this graph here just shows how much measurement error there is where the parts are averaged over time and specifications if it makes sense. In this case, because they're vastly different than each other, it's kind of hard to see what's going on, but this is the measurement error compared to the differences that I'm measuring in all these different thicknesses. I might want to see them normalized, and so I click the normalize button here that brings them all back to the average and I can look at the variability of each individual measurement around that average, whatever the average was, and it's much easier to see.

And of course then it's going to draw my spec limit on there as well because now my spec limit is uniform across all of them - it's plus or minus 0.5. So this dash dot line right here is the spec, and these dotted lines up here are the anticipated measurement error across both machines. You can see how much bigger that is compared to the specification itself, and that's why I'm ending up with a very high percent RR. It is because the anticipated error is pretty large compared to the spec.

Okay to make it a little bit easier to see, I can add some jitter and just move those little dots around a little bit so they aren't all on top of each other. And if you like, you can do a distribution density plot - a violin plot - and see what the shape of those distributions are as well. They're telling me the same story, which is I've got a lot of variation and that variation is pretty large compared to the spec itself.

I can also do a boxplot, which is kind of the same data that I saw before. It doesn't look very interesting with all the different magnitude differences there, so I can normalize it and I see pretty much what I saw before. There is pretty similar variability within part but that there is a significant change in the amount of dispersion based on the thickness of the part itself.

Okay, now based on everything we've talked about so far this shows where the components of variation are coming from. And as you'd expect, the reproducibility - the within gauge variability - is really where most of it's coming from, because it's kind of built into this measurement device itself. I've got two measurement devices and they're a little bit different on average. That's what this top piece of the stack bar is, but really it's just they've got a common amount of variability within the measurement device itself.

Okay, what impact is it going to have if I put this into my process? Well, if I put that in my process without ever doing anything but a calibration sticker or something like that, I'm going to have some problems because, going back to my process variables, I've determined that I've got a 0.17 standard deviation for my product. That's that blue distribution, and I compare it to the spec width, which is plus or minus 0.5 millimeters. I can see that it's minimally capable - I really should expect to see most of my product falling within spec.

The problem is because of the gauge variability associated with these two machines, this blue band here and this green band here, are my danger zones. Those are the areas where I've got a probability of misclassifying conforming product as non-conforming, or non-conforming product as conforming. So the calculations then say that the probability of misclassifying in spec is about 0.16% of the time. That stuff was out of spec, was made out of spec, but called in spec 0.16% of the time. So that's kind of bad, because it means you're going to be passing along stuff that's out of spec to your final customer.

But it also is misclassifying things as out of spec that are actually in spec, and so these are things that are perfectly fine - they're within the specification but 34 or 35% of the time I'm going to actually be scrapping it as if it's out of spec. And that means that I'm throwing away perfectly good product just because of measurement error itself, even though this measurement is in fact stable over time.

And so overall about 34.6% is estimated to be called either out when it's in or in when it's out - to be misclassified. The vast majority of that of course is stuff that's perfectly fine, but that I am calling out. The problem is I don't know what's the reality, right? If on a single measurement, if I find that it's out of spec, I have to make my decision, right? I have to scrap it even though I know it's almost certainly in spec.

Now for those of you that might have run across this before, it is possible to reduce the variation in your measurement error by taking multiple samples. It's the inverse of square root of the number of samples that you take, so if I take four samples I cut my measurement error in half. If I take nine samples I cut it into a third. The problem is to get this to some sort of a reasonable level of percent R&R of around 10, I'm looking at like 25 26 re-measures of the same thing to determine if it's in spec. So in terms of practicality, I'm probably not going to be able to use this thing in production to make determinations as to whether it's going to go to my customer, because it's just going to require way too much work to do multiple measures.

This is adding to the fact that the real issue here is my uniformity of dispersion. Now bias I can correct for, right? So bias I can build an equation. I can put that into my machine or I can do it by hand. and I can recorrect it back to the true value. but variability I can't do anything other than take multiple measurements for. So even if I took that bias out, I'd still be stuck with doing a whole bunch of measurements to try and reduce that variability. And I'd be doing more measurements on the upper end on the thicker side than I would on the thinner side because there's more variation associated with measuring those higher magnitudes, those thicker pieces of plastic.

So what was the deal? Well this machine originally was intended for measuring something that was quite a bit thinner, and so this this uniformity issue, this change in variability depending on magnitude, never kind of reared its head because we were using it to measure much thinner things. But as we go to thicker and thicker product, I get more and more variability. So it's simply not suited for this activity that we're trying to use it for. We need to find a different solution there.

Okay, so like I said the power of a long-term study is you keep measuring over time. Now if this had passed as long-term study and we made the determination that was okay to use in production, we would continue measuring those eight parts once a day, let's say once a week, whatever makes sense, and we would continue to track on our control charts to make sure that nothing had changed in the interm.

Now the frequency is going to be a balance between how long can you go before it would have been a problem - if I go a week and enough product is now suspect that would be a problem for me then I don't want to go a week anymore. But I also have to balance that with the cost of doing it, so if daily is too costly but weekly is too big of a risk somewhere in between the two you'd find your balance of how frequently you'd want to reexamine that measurement device.

In my experience, once a day is not usually overwhelming because in this particular case, I'm only measuring eight things once a day and that's usually not too bad.

So that's what the long-term study would be buying you - the peace of mind that your measurement system today hasn't changed from where it was last week, where it was last month, even last year. You keep those eight parts, you have to protect them because you have to recalculate things if you have to throw those parts away and use new parts. You keep those eight parts and you keep measuring over time. If you see a change in any of the control charts, or even in the linearity or the uniformity tests, that tells you that something with that measurement system has changed and that you need to investigate and fix it before you start making the wrong decision on your product or your process.

Well I hope you enjoyed seeing how ROIstat helps you make those calculations and generates all those graphs for you, and can really help your business in terms of making sure that you're making the right decision about important process and product measurements and final product measurements, or in-process measurements themselves. It's available for free for download, so check it out let me know what you think!

Have a great day.


Stay With Us



2025 Red Cloud Road
Longmont, CO 80504

Talk to us