Timer Problems impact fairness of District Race

Stan Pope · Post by **Stan Pope** » Wed May 16, 2012 1:27 pm

Here is a slightly different way to view the statistics that should help us understand the signifigance of the numbers:

Suppose the analysis lists 10 heats and tells you that 5 of these 10 heats (don't know which ones for sure) are defective to the extent that rankings are probably affected. Your choice is
1) Let the results stand as they are.
2) Rerun all 10 heats, verifying by some method (to be announced) that those new results are not defective.
3) Rerun the entire race.

In my opinion, if there were 10 heats in a competition of 50 racers with 5 of the heats being defective, we have big problems with the track and it should be reworked carefullly to correct the defects. But now, in the middle of a competition, that is not a solution to the immediate problem of, as accurately as possible, ranking the competitors so that they get the result that they deserve.

I believe that the reruns of the 5 defective runs will change the rankings significantly and place those competitors much closer to the ranking which they deserve. I also believe that the rerun of the 5 non-defective runs may change the rankings slightly in that correct but low times are replaced by times which vary about the respective racer's means, resulting in rankings close to but possibly higher than they deserve. Accordingly, I would rerun the 10 without a second thought!

In the scenario above, suppose that it were highly probably that 3 of the 10 were defective. What then?
What if 7 of the 10 were defective? Where would one "draw the line?"

FatSebastian · Post by **FatSebastian** » Wed May 16, 2012 2:39 pm

Stan Pope wrote:Suppose the analysis lists 10 heats and tells you that 5 of these 10 heats (don't know which ones for sure) are defective...

Does this intend to say that 50% of all heats were flagged as possibly defective? Or does this literally suppose a method of analysis that can declare X=5 of Y=10 heats (from a population greater than Y) are defective, yet the analysis can't discriminate which X of the Y heats are the defective ones?

It would seem that, to declare a defect, an outcome must be subjected to some pass/fail criterion. Thus "defectivity" is established according to whatever criterion was used by the method of analysis. Of course the choice of criterion is discretionary, so different methods and analysts may disagree.

Stan Pope · Post by **Stan Pope** » Wed May 16, 2012 3:22 pm

FatSebastian wrote:
Stan Pope wrote:Suppose the analysis lists 10 heats and tells you that 5 of these 10 heats (don't know which ones for sure) are defective to the extent that rankings are probably affected.
Does this intend to say that 50% of all heats were flagged as possibly defective?.

No. There might have been 50 or so heats in all but the analysis flagged those 10 as "as likely defective as not."

FatSebastian · Post by **FatSebastian** » Wed May 16, 2012 4:21 pm

Stan Pope wrote:No. There might have been 50 or so heats in all but the analysis flagged those 10 as "as likely defective as not."

Unfortunately I continue to stumble on the distinction of 5 v. 10 in the description "the analysis lists 10 heats and tells you that 5 of these 10 heats (don't know which ones for sure) are defective..."

Is the supposition that 10 are identified as potentially defective (based on some kind of analyst criterion), but 5 of 10 "are defective to the extent that rankings are probably affected" whereas the other five heats would probably not affect the rankings?

Or is the supposition that the analysis method generates twice as many (10) alarms as true defects to capture all (5) true defects (i.e., results include a high number of false alarms that must also be rerun)?

Or does the analysis suggest that 5 heats are "definitely defective" and 5 are "maybe defective" (whatever "maybe" might mean)? Oh, wait... because we already said we "don't know which ones for sure" are defective, and that the analysis flagged those 10 as "as likely defective as not," we can't claim any are "definitely defective"... can we?

Stan Pope · Post by **Stan Pope** » Wed May 16, 2012 7:15 pm

You are making the question too hard!

Following some plan, e.g. the coincident slowest runs of the competition, I rejected the null hypothesis that teh equipment was okay because at 95% confidence there could be up to 5 trials which satisfied, and, in fact, there were ten such hits. They aren't all necessarily bad. Some might have occurred due to random chance.

In this case, there might be more than 5 that are bad, but for purposes of my original question, please ignore how I know that 5 of the listed 10 are defective.... it doesn't matter how I know, so long as I know! The nut of the question has to do with the relative values of the types of reruns!

FatSebastian · Post by **FatSebastian** » Thu May 17, 2012 7:53 pm

Stan Pope wrote:You are making the question too hard! [...] for purposes of my original question, please ignore how I know that 5 of the listed 10 are defective.... it doesn't matter how I know, so long as I know!

Thanks! I was just thrown off by the description. This is because usually, one will never definitively know something when testing a hypothesis statistically; rather, one just expects likely outcomes assuming the null hypothesis is true. So if I understand, in the supposed case we expect five samples to meet/exceed some criterion (per some null hypothesis) but ultimately 10 samples met/exceeded that criterion.

In practice, such problems can be often approached in reverse. Instead of denoting the expected outcome at a prescribed confidence level, a confidence level may backed out from the observed outcome and the hypothesis is questioned whenever the inferred confidence isn't "high" enough in the analyst's opinion.

Post by **gpraceman** » Wed Sep 19, 2012 12:14 pm

The new version of GrandPrix Race Manager does have an implementation of Stan's timing audit. I would appreciate any feedback from users on this feature. Of course, I want to limit the number of false alarms but still turn up any possible timing anomalies.

I did run quite a few data sets through it, most of those being data from actual races, and did notice some situations where one might get a false alarm.

1) Testing - In a testing situation someone may enter arbitrary times or wave their hands over the timer's lane sensors, so times are not "real world". The odds of a heat having all racers finishing with their highest/slowest time greatly increases. The user should know that these are not real times, so should just ignore any timing audit warning.

2) Pit Stops - If cars are not finishing or are just running poorly, the pit crew may perform a relube or adjustment to those cars (or all cars). Any intervention by the race crew can up the odds of a timing audit warning. The first runs of the cars might very well be their slowest.

This feature is documented in the GPRM help file (a too often underutilized resource). Hopefully, this will help the user determine if there is actually a timing problem and if so, to identify and resolve the problem. It is broken down into Pit Stop (car tune ups), Equipment Problems, Environmental Problems, and Improper Operation categories. I would appreciate any feedback on the help file documentation as well.

birddog · Post by **birddog** » Thu Sep 20, 2012 9:04 am

Randy-

Thanks for implementing this in v12 of your software. I've purchased it and started playing with it. I've run a "mock" pack race using the "Test Data" button to generate random times for the races. In using this random data, I did have a few groups for whom the audit triggered. I simply re-ran the heats that the audit was flagging with more random data and I no longer had the audit trigger.

It seemed to work very well in the "mock" races I've run using random data.

thanks,

birddog

FatSebastian · Post by **FatSebastian** » Thu Sep 20, 2012 5:43 pm

gpraceman wrote:Of course, I want to limit the number of false alarms but still turn up any possible timing anomalies.

I think Stan's write-up proposed a 75% confidence level. Is the confidence level a user-defined setting in GPRM? (If not, what is it set to?)

Post by **gpraceman** » Thu Sep 20, 2012 9:01 pm

FatSebastian wrote:I think Stan's write-up proposed a 75% confidence level. Is the confidence level a user-defined setting in GPRM? (If not, what is it set to?)

That is the limit that GPRM is currently set at, as that seemed like a good starting point. It is hard coded for the time being.

Post by **gpraceman** » Fri Sep 21, 2012 11:15 am

gpraceman wrote:
FatSebastian wrote:I think Stan's write-up proposed a 75% confidence level. Is the confidence level a user-defined setting in GPRM? (If not, what is it set to?)
That is the limit that GPRM is currently set at, as that seemed like a good starting point. It is hard coded for the time being.

To clarify a bit. The confidence level is a minimum of 75%. It can end up being much higher, depending on the number of lanes, total number of runs for each racer, total number of racers, and the total number of heats. Those affect the probability that all racers in any given heat will experience their slowest/fastest time.

FatSebastian · Post by **FatSebastian** » Fri Sep 21, 2012 7:37 pm

gpraceman wrote:To clarify a bit. The confidence level is a minimum of 75%. It can end up being much higher...

Thanks for the clarification.

A 75% confidence level means that the behavior being noticed / flagged already has a 25% probability of occurring by chance without any malfunction. However, FWIW, beating one-in-four odds hardly indicates an anomaly. For example, the confidence level of experiencing "tails" in two coin flips is also 75% (because the chances of flipping a fair coin twice and only seeing "heads" is 25%). Yet no one would suspect a coin as unfair (e.g., a two-headed counterfeit) just because one didn't see a "tail" in two flips.

Because a higher confidence threshold could be preferred by some race managers desirous of minimizing false alarms, it might be nice to at least report the confidence/significance level at which an audit flag is being triggered, so a race manager can decide whether to rerun based on the probability level used to signal an audit.

Post by **gpraceman** » Mon Sep 24, 2012 3:09 pm

FatSebastian wrote:Because a higher confidence threshold could be preferred by some race managers desirous of minimizing false alarms, it might be nice to at least report the confidence/significance level at which an audit flag is being triggered, so a race manager can decide whether to rerun based on the probability level used to signal an audit.

That is a good suggestion. In the next update to GPRM the confidence level will be reported when a warning is triggered. I am also adding the ability to set the minimum confidence level limit, if a race coordinator so wishes. I think I will also raise the default minimum confidence level to 95% to keep the false alarms down and be more in line with statistical convention.

In a couple of data sets I did get what I believed to be false alarms. One was the first heat for the racers in question and they all received their slowest time of the race. The other was the last heat for a set of racers and they all received their fastest times. Looking at their different heat times for the whole race, their times did not appear at all out of sorts. It seams reasonable to me that these two particular cases can be attributed to lube break-in, not a timing problem.

Post by **gpraceman** » Tue Sep 25, 2012 5:00 pm

gpraceman wrote:In the next update to GPRM the confidence level will be reported when a warning is triggered. I am also adding the ability to set the minimum confidence level limit, if a race coordinator so wishes. I think I will also raise the default minimum confidence level to 95% to keep the false alarms down and be more in line with statistical convention.

This update has been posted. The Minimum Confidence Level can be set on the Advanced Software Options screen. The default value is 95%.

Derby Talk

Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race

Re: Timer Problems impact fairness of District Race