Announcement

**regexcellent** · October 1, 2012, 19:08

You don't have a "false negative" in statistics. You "fail to reject the null hypothesis."

**DaShi** · October 1, 2012, 19:09

Then how can you have a false positive?

**regexcellent** · October 1, 2012, 19:10

False positive is where you mistakenly reject the null hypothesis.

**Hauldren Collider** · October 1, 2012, 19:11

Easy. A false positive is where your data shows a statistical correlation when there actually isn't one. If you have a 95% confidence interval, you can expect a false positive 5% of the time.

**DaShi** · October 1, 2012, 19:12

So a false negative should really be called a "failure to reject the null hypothesis"?

Interesting.

**regexcellent** · October 1, 2012, 19:13

No, there is no such thing as a false negative. If you fail to reject the null hypothesis, you didn't get anything out of that data.

Kuci, please correct me if I'm wrong because I practically slept through prob&stat.

**DaShi** · October 1, 2012, 19:29

So failure to reject a true null hypothesis is a false positive.

Failure to not reject a false null hypothesis is meaningless data.

I'll just wait for Kuci's response on the statistical methodology. I'm not saying Kuci is wrong. In fact, I wouldn't mind him being right. I'll just get more out of his response.

**Kuciwalker** · October 1, 2012, 19:44

Please ignore regex and HC, their answers are not quite what I was getting at.

Originally posted by DaShi View Post

Then please explain the difference.

When we conduct a statistical test, we have two hypotheses: the null hypothesis and the alternative hypothesis. The null hypothesis is, generally speaking, "there is no relationship between these two things". The alternative hypothesis is, generally speaking, "there is some (possibly specific) relationship between these two things".

In context, the hypotheses are:

null: "Agent Orange does not cause birth defects"
alternative: "Agent Orange does cause birth defects"

We then create some sort of experimental design like "look at a group of people exposed to Agent Orange and see how many birth defects their children had". We then compare that number to the number of birth defects we would expect to see in a completely normal population.

Of course, we don't want to just conclude "yes, Agent Orange causes birth defects" if we see more than expected - we should see more than expected half of the time even if the null hypothesis is true! If we relied on that standard of evidence then we would believe a great many things that just aren't true.

Almost universally* the standard of "statistical significance" is taken to be 95% (equivalently, 5%); what that means is that if we see a result so extreme that it would only happen 5% of the time even if the null hypothesis were true, then we will take the result to be "significant" and believe it.

*this varies somewhat between disciplines

This may seem like a high barrier of evidence to you, but in context it is actually quite low. Many, many experiments are performed using these tests; if we believe anything that meets the 5% standard, we will still probably believe many, many false things. Even within a single study, many of these statistical tests might be performed (the Italian study mentioned a few pages back conducted over 100). A study that conducted 20 different statistical tests would have a large chance of finding a "significant" result even if the null hypothesis were true.

So, regarding the specific studies referenced in #267:

The Air Force conducted a study and found no significant relationship; if the null hypothesis is true there is a 95% chance of that happening. The researchers took the Air Force's data and applied at least one new statistical test to it and found one significant result*. If the null hypothesis is true there is a 5% chance of this happening. We can actually look at these two results together and figure out what the new "correct" answer is given a 5% standard: the answer is 2*0.05*0.95 + 0.05*0.05 = .0975 = 9.75%, which is not significant at our 5% standard.

*I'm inferring the "one" from the language of the YaleNews article

That was all assuming they only performed one statistical test. If they conducted multiple (which is very likely) then their other result would be even less significant.

Even worse, they may have conducted many tests but only published the one that provided a significant result. This is referred to (pejoratively) as "data mining" and is the reason that a study that simply applies new tests to old data is particularly suspect. The data mining might not even be intentional - maybe a dozen different researchers have looked at this data over the years, and these are the first to discover a new significant result. If the others didn't publish then we would falsely believe the result to be much more significant than it actually is.

**Kuciwalker** · October 1, 2012, 19:49

By the way, these are not just theoretical problems. As I referenced a while back in this thread, in many sciences (I know specifically medicine, psychology, and econometrics) there is growing awareness that the current model does not work because of this problem. There have been a number of meta-studies in these fields that find that over half of published "significant" results are not reproducible; when the researchers try to conduct the same experiment they find a non-significant result.

**DaShi** · October 1, 2012, 19:57

Ah, I see. So if you run several different kinds of tests on the same data and only reject the null hypothesis half the time, it is more reasonable to assume that there simply is no significance.

I still stand by statistics be frustrating.

**Kuciwalker** · October 1, 2012, 20:04

Originally posted by DaShi View Post

Ah, I see. So if you run several different kinds of tests on the same data and only reject the null hypothesis half the time, it is more reasonable to assume that there simply is no significance.

It depends on how many you run. If you run 100 tests and 50 of them are significant at a 5% level, then in aggregate that is still significant at a 5% level. If you run 2 and only 1 is significant, then nope.

The worst case is when you run 1 test and find that it is significant at a 5% level, but don't know that 19 other people ran the same test, found it to be insignificant, and didn't publish.

I still stand by statistics be frustrating.

I don't disagree. The subtleties of statistical testing - where the significance of a given result depends most importantly on what other tests you have run - are unintuitive.

**Kidlicious** · October 1, 2012, 20:04

I found statistics courses to be a lot more work than calculus courses.

**DaShi** · October 1, 2012, 20:06

Originally posted by Kuciwalker View Post

It depends on how many you run. If you run 100 tests and 50 of them are significant at a 5% level, then in aggregate that is still significant at a 5% level. If you run 2 and only 1 is significant, then nope.

The worst case is when you run 1 test and find that it is significant at a 5% level, but don't know that 19 other people ran the same test, found it to be insignificant, and didn't publish.

Thank you. This has been very informative.

**Hauldren Collider** · October 1, 2012, 20:16

Originally posted by Kuciwalker View Post

It depends on how many you run. If you run 100 tests and 50 of them are significant at a 5% level, then in aggregate that is still significant at a 5% level. If you run 2 and only 1 is significant, then nope.

I think this is a typo--it should read "100 tests and 5 are significant", not 50, right?

**Kuciwalker** · October 1, 2012, 20:17

No.

Announcement

A depressing thread

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment