Friday, November 30, 2007

The Ten Statistical Commandments

1) Thou shalt log thy data! We live in a multiplicative world, which means our data live in a log world. Always log any data with a lower zero bound, unless there's also an upper bound, in which case that shalt perform a logit transformation. Log until proven linear, and be holy.

2) Thou shalt run non-parametric tests! If the parametric and non-parametric tests come out the same, thou hast lost nothing. If they don't, the data are non-normal, the parametric test is wrong, and thou shalt use the non-parametric result. Spearman, Mann-Whitney, and Kolmogorov-Smirnov are the Holy Trinity (or Quintinity, or whatever). Worship them!

3) Thou shalt disdain p-values! p = 0.05 is a heathen idol, and ANOVAs are for those who have not yet seen the light, still dwelling in the darkness of obsessive frequentist hypothesis testing. Remember, if thou hast enough data anything will turn significant, no matter how small the difference. And the "significance level" is whatever thou choosest it to be, not what someone tells thee it should be. So, describe data, don't just test data. Don't merely ask whether there's a significant difference, ask what is the difference, why is there a difference, and have I confidence in that difference?

4) Thou shalt worship the almighty power! Despite the preceding commandment, accepting the null hypothesis is a vile, ungodly thing. Always make sure thou hast the statistical power and a small enough difference relative to what thou carest about to argue that a difference doesn't matter (not just that it isn't "significant"). When in doubt, find a power calculator on the web and do a proper power analysis.

5) Thou shalt abhor tiny little time series! All too often people are seduced by "trends" of two or three data points, damning themselves to eternal hellfire. The two-tailed probability of a flawless "trend" with six points is 0.0625 (!). "Before" and "after" comparisons are no better than a single coin flip, unless the points in each category have significantly different averages. Coincidences are often coincidences: if (say) the biggest extinction happened in the same interval as the biggest climate change, and there are ten intervals, well, p = 0.10. So, demand that a time series analysis include a healthy number of data points, at least a dozen or a score or a cubit.

6) Thou shalt difference thy data! Time series data are almost always autocorrelated (and thou shalt test for that). Still, people insist on interpreting "trends" shared by pairs of time series as meaningful cross-correlations, even though autocorrelation makes finding these demonic things the null hypothesis! Even random walks produce such patterns! FEAR YE SINNERS! The easiest and most powerful way to remove the autocorrelation is to take first differences. So, the next time thou wantest to correlate population growth with the rate of sea-floor spreading - and people will - difference thy !@#$% data.

7) Thou shalt not play with PCA! Principal components analysis assumes linear responses of observed variables to underlying variables, but most ecological data show modal responses. Vain mortal, what power grants thee the right to assume linearity? Correspondence analysis can handle both kinds of responses and works wonderfully on modal data (we won't mention that nasty little arch effect...).

8) Thou shalt not cluster shamelessly! The world is full of fuzziness and apostasy, not cool, clean Platonic categories. But cluster analysis imposes categories on data regardless of whether they're gradiential. If the clusters are really there, thou shalt see them as a ray of divine light in the shadowy purgatory of a multivariate ordination space. So why bother?

9) Thou shalt stand awe-struck before the shining brilliance of the G-test! Chi-square this, chi-square that. The G is easier to compute, it doesn't blow up as easily because of small values, it depends on the awesome power of the log transform, it stands for "GOD," and most importantly it's a maximum likelihood ratio...

10) Thou shalt sing the praises of likelihood, not "fit"! Anyone can design another fit statistic. Why minimize the sum of squares instead of the sum of cubes or just the sum of differences? None of this has a theoretical basis without a notion of probability, and specifically of likelihood. After all, that's what the divine theologian Popper said.

Original Source: http://www.nceas.ucsb.edu/~alroy/JA_commandments.html

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home