When are numbers facts, when are they statistics, and when are they relevant? These questions frequently beguile courts and parties alike. Let’s take a look.

Cheri Hutson sued her employer, Federal Express, for sex discrimination, claiming that she had been denied promotion on account of her sex. The case came up for trial, and Federal Express filed a number of motions *in limine*. A motion *in limine* is a motion filed shortly before trial, asking the Court to make a ruling concerning some issue that is going to come up at trial. The most common motions *in limine* seek evidentiary rulings.

One of Fedex’s motions asked Judge Anderson (who presided over the case) to prevent Hutson from introducing into evidence the fact that plaintiff’s manager had never, or almost never, hired a woman into a managerial position. According to the Court’s opinion, women applied for manager positions 16 times during the tenure of plaintiff’s manager, and only once was a woman hired.

Well, that seems pretty relevant in a sex discrimination case, doesn’t it? Indeed it does, and indeed it is. Not only is it a matter of common sense, but the Supreme Court, in the most important discrimination opinion on the books, stated that the defendant’s “general policy and practice with respect to minority employment” would be relevant in a discrimination case. It went on to endorse the use of “statistics as to petitioner’s employment policy and practice” to determine whether the refusal to hire the plaintiff “conformed to a general pattern of discrimination.” But Judge Anderson ruled in favor of Federal Express and refused to allow plaintiff to introduce this evidence.

Why? Judge Anderson wrote that “statistics are valid and helpful in a discrimination case only to the extent that

“the methodology and the explanatory power of the statistical analysis sufficiently permit an inference of discrimination. Specifically, the statistics must show a significant disparity and eliminate the most common nondiscriminatory explanations for the disparity.”

He went on to note that the small “sample size” of the plaintiff’s statistics eliminated its probative value. Probative value is just legal mumbo jumbo for the ability of evidence to prove something. In other words, the Judge ruled, in essence, the fact that plaintiff’s manager only hired 1 woman in 16 hiring decisions didn’t prove anything.

Ahem.

The case went to trial and plaintiff lost. She has filed a motion for a new trial, citing the exclusion of this evidence as one of the grounds.

Was Judge Anderson’s decision correct? I say no. In my view, He got caught up in the numbers as facts versus numbers as statistics confusion. The confusion is caused by a misconception that can be expressed in the following logical fallacy:

- Statistics can be used to prove discriminatory intent. Such statistics, however, are admissible only if they satisfy rigorous statistical tests necessary to render them reliable. (This is true.)

- Statistics are derived from numbers. (This is true.)

- Numbers are not admissible unless they satisfy rigorous statistical tests necessary to render them reliable. (This is false.)

In other posts, I have alluded to the fact that defendants have been successful in foisting a number of dubious “doctrines” on the Courts. The idea is numbers must meet certain statistical requirements in order to be admissible is one of those dubious doctrines.

If Hutson’s manager hired only 1 woman in 16 hiring situations, that is a fact. The only test that it should have to meet is the general test for relevance: does it make a material disputed fact more or less likely to be true? The material disputed fact here is whether Hutson’s manager discriminated against women. If he did, then we would expect there to be few women in managerial positions he filled. If he did not, we would expect there to be a representative number of women in positions he filled. Patently the number of women in managerial positions filled by Huston’s manager is relevant to the issue whether he discriminates against women.

Where did Judge Anderson go wrong? He went wrong by confusing numbers as facts and numbers as statistics. In some cases, a plaintiff will want to use statistics as the only evidence of discriminatory intent. What is the difference? In *Hutson*, plaintiff alleged that an objectively less qualified man was given the promotion instead of her. That fact alone is sufficient to prove discrimination. The numbers that she sought to introduce are additional evidence that bolsters her claim. In *Bender v. Hecht’s Dep’t Stores*, 455 F.3d 612 (6th Cir. 2006), one of the cases relied on by Judge Anderson, the plaintiffs alleged that they were chosen for layoff in a downsizing because of their age. The evidence offered by the plaintiffs was the fact that the average age of the individuals with their job title was 41.7 years old, while the average age of the employees who were laid off was 43.4 years. What the plaintiffs in *Bender* did not say was “X should have been laid off instead of me.” In other words, they weren’t comparing themselves to other employees, they merely claimed that the process was discriminatory. You can see the difference.

In my view, *Bender* was right (on this point) for the wrong reason, because the evidence in that case was also numbers as facts and not numbers as statistics. In a discrimination case, true statistics deal with probabilities; namely the probability that a certain event was caused by discrimination versus something else. Let’s say that your employer makes employment decisions by flipping a magic coin. If the coin lands on heads, it’s one decision, and if it lands on tails, it’s another. What makes the coin magic is that if the flipper has discrimination in his or her heart, the coin will land only on heads.

So let’s suppose that a manager has to decide who to hire, and the choice is between a man and a woman. The manager flips the magic coin. If the manager does not discriminate, it is equally likely that a man or a woman will be hired. If the manager discriminates, the coin will land only on heads and a man will be hired. The coin is flipped, and it lands on heads. Did it land on heads because the manager discriminated, or just by chance? We can’t tell.

There’s another job opening, and the coin again lands on heads. There is a one in four chance of getting heads twice in a row. It could still be just chance. After another job opening it’s heads again. The odds of three heads in a row are one in eight, 12.5 percent. Now, here is where statisticians differ from ordinary mortals. You and I may well say that it’s got to be discrimination, but statisticians are more cautious. When there is a one in eight chance of something happening, it *is* going to happen from time to time, and it would not be so unusual that the statistician would be ready to conclude that it can’t be chance. Think about it, if there were a one in eight chance that your son was going to crash the car, would you ever let him drive it? Very few of us would take any serious risk on one in eight odds.

At what point do statisticians say enough is enough, i.e., “statistically significant.” The short answer is at 5 percent (one in twenty). The long answer is “it depends,” but we can ignore that for present purposes. At one in twenty, we are talking about five consecutive coin flips landing on heads.

Let’s get back to the *Hutson* case. 1 in 16 hires was a woman. 1 in 16 is more than 5 percent, so it’s not statistically significant is it? Not so fast. If each hire is a coin toss, what are the odds that the coin would land on heads 15 times and tails only once? That’s a lot less than 1 in 16 or 1 in 20. That is correct. It would be very rare to flip a coin 16 times and end up with 15 heads and 1 tail. It would be statistically significant.

What is this “sample size” that Judge Anderson was talking about? Let’s say that you wanted to predict who is going to win an election. One way to do that would be to ask each and every voter how he or she was going to vote. Assuming the voters tell the truth and don’t change their minds, you would have a very accurate prediction. But usually it’s not possible to poll every single voter. Statisticians (bless their hearts) have figured out how to predict the characteristics of a large group (called a “population”) by looking at the characteristics of a small portion of that group, and that portion is called a “sample.” Basically, for a population of “x” members, a random sample of “y” members will predict the composition of the population to a “z” degree of certainty.

When Judge Anderson referred to plaintiff’s evidence concerning the number of male versus female hires as a “small sample size,” he was just wrong. The numbers represented the entire population, therefore it was meaningless to talk in terms of sample size.

Although I believe that Judge Anderson should have allowed plaintiff to introduce the evidence, for a proper analysis we need to go into a little more depth. I took a look at the motion papers filed in Hutson, and it’s not clear to me that the 1 woman in 16 hiring decisions numbers referred to by Judge Anderson was correct. (The motion papers are not the model of clarity on this point.) What I gleaned was that there were five occasions on which both men and women applied for a manager’s position and there was at least one successful candidate. On each occasion, there were significantly more men than women applying for the position. All told, 54 men and 11 women applied. 7 men and no women were hired. According to my calculations, the odds of this happening by chance (i.e., in the absence of discrimination, everything else being equal, are 31.7 percent. Not statistically significant, but certainly relevant.

If only one woman had been hired, however, the numbers would be perfectly in line with the odds. This brings us back to what Judge Anderson’s meant when he wrote “small sample size.” Change that to “small population size” and his reservation is valid. Small populations generally will not prove much of anything, because small changes in the numbers have such a big effect. If you roll dice only three times, you can’t really tell if the dice are loaded or not. Roll them a thousand times, and you’ll know.

That being said, I still think that the jury should have seen those numbers. After all, they are perfectly consistent with discrimination. More importantly, they represent facts, things that actually happened. Although the numbers may not be statistically significant, the test is relevance, not statistical significance. Relevance is the “tendency” to make a fact more “probable,” and the numbers clearly do that. Relevant evidence is admissible, and the probative value is for the jury to determine. You can be sure that Federal Express would have wanted to introduce the evidence if three women and only four men had been hired. The Supreme Court has said that the hiring practices of defendant are admissible. The admissibility does not depend on what those hiring practices were.

Will Hutson win her motion for a new trial? Not likely. She made the motion, because she is required to if she wants to appeal. Will she win her appeal? I can’t say, because I haven’t read the trial transcript, but I don’t think Judge Anderson’s ruling on the evidence of Fedex’s hiring practices would be considered reversible error. Isn’t that unfair? Yes, indeed it is.