Friday, November 30, 2007

The Ten Statistical Commandments

1) Thou shalt log thy data! We live in a multiplicative world, which means our data live in a log world. Always log any data with a lower zero bound, unless there's also an upper bound, in which case that shalt perform a logit transformation. Log until proven linear, and be holy.

2) Thou shalt run non-parametric tests! If the parametric and non-parametric tests come out the same, thou hast lost nothing. If they don't, the data are non-normal, the parametric test is wrong, and thou shalt use the non-parametric result. Spearman, Mann-Whitney, and Kolmogorov-Smirnov are the Holy Trinity (or Quintinity, or whatever). Worship them!

3) Thou shalt disdain p-values! p = 0.05 is a heathen idol, and ANOVAs are for those who have not yet seen the light, still dwelling in the darkness of obsessive frequentist hypothesis testing. Remember, if thou hast enough data anything will turn significant, no matter how small the difference. And the "significance level" is whatever thou choosest it to be, not what someone tells thee it should be. So, describe data, don't just test data. Don't merely ask whether there's a significant difference, ask what is the difference, why is there a difference, and have I confidence in that difference?

4) Thou shalt worship the almighty power! Despite the preceding commandment, accepting the null hypothesis is a vile, ungodly thing. Always make sure thou hast the statistical power and a small enough difference relative to what thou carest about to argue that a difference doesn't matter (not just that it isn't "significant"). When in doubt, find a power calculator on the web and do a proper power analysis.

5) Thou shalt abhor tiny little time series! All too often people are seduced by "trends" of two or three data points, damning themselves to eternal hellfire. The two-tailed probability of a flawless "trend" with six points is 0.0625 (!). "Before" and "after" comparisons are no better than a single coin flip, unless the points in each category have significantly different averages. Coincidences are often coincidences: if (say) the biggest extinction happened in the same interval as the biggest climate change, and there are ten intervals, well, p = 0.10. So, demand that a time series analysis include a healthy number of data points, at least a dozen or a score or a cubit.

6) Thou shalt difference thy data! Time series data are almost always autocorrelated (and thou shalt test for that). Still, people insist on interpreting "trends" shared by pairs of time series as meaningful cross-correlations, even though autocorrelation makes finding these demonic things the null hypothesis! Even random walks produce such patterns! FEAR YE SINNERS! The easiest and most powerful way to remove the autocorrelation is to take first differences. So, the next time thou wantest to correlate population growth with the rate of sea-floor spreading - and people will - difference thy !@#$% data.

7) Thou shalt not play with PCA! Principal components analysis assumes linear responses of observed variables to underlying variables, but most ecological data show modal responses. Vain mortal, what power grants thee the right to assume linearity? Correspondence analysis can handle both kinds of responses and works wonderfully on modal data (we won't mention that nasty little arch effect...).

8) Thou shalt not cluster shamelessly! The world is full of fuzziness and apostasy, not cool, clean Platonic categories. But cluster analysis imposes categories on data regardless of whether they're gradiential. If the clusters are really there, thou shalt see them as a ray of divine light in the shadowy purgatory of a multivariate ordination space. So why bother?

9) Thou shalt stand awe-struck before the shining brilliance of the G-test! Chi-square this, chi-square that. The G is easier to compute, it doesn't blow up as easily because of small values, it depends on the awesome power of the log transform, it stands for "GOD," and most importantly it's a maximum likelihood ratio...

10) Thou shalt sing the praises of likelihood, not "fit"! Anyone can design another fit statistic. Why minimize the sum of squares instead of the sum of cubes or just the sum of differences? None of this has a theoretical basis without a notion of probability, and specifically of likelihood. After all, that's what the divine theologian Popper said.

Original Source: http://www.nceas.ucsb.edu/~alroy/JA_commandments.html

Trading Timeframe!

Free Image Hosting at www.ImageShack.us

Stress Free Trading (ahhh)

Free Image Hosting at www.ImageShack.us

Determining the probability of your option position being profitable

Probabilities are hard to define and quantify and hence subjective nature of the elements cannot be completely eliminated. I go with the assumption that the variables follow a normal distribution.
I shall take a simple linear payoff model to explain what I've understood :- Writing a naked call option.

1) Collect sufficient data of prices. [Eg - 500 bars of nifty data]
2) Determine the number of trading days remaining to maturity, and call it as "n".[If you're buying a call option today(22nd), oct expiry (25th), no of days remaining is 3]
3) Calculate "n" day rate of change, in % terms for the sample collected in step 1. [ Calculate 3 day ROC of the 500 bar data available]
4) Determine your break even point, and the % change in price to break even. [If you're selling a 5600 call, with CMP at 5184, % change to achieve break even is 8%]
5) Calculate average ROC for the data available [sum of "3" divided by count of 3, the value is .4971]
6) Calculate standard deviation for the data available in "5".[StDev = 2.71625]
7) z score is defined as (4 - 5)/6 [(8 - .4971)/2.71625] = 2.7622
8) Find the probability of z value, by looking up the normal distribution table http://www.isixsigma.com/library/con...stribution.asp.
9) For a z value of 2.7622, the probability is .4971 on the right side of the distribution. Since we are not affected by the fall in prices, probability that the price will be below value in step 5 is 0.5

We can, thus conclude, that if we write a 5600 nifty call, probability of our trade being successful would be .5 + .4971 = .9971 or 99.71%.

Although this example is for a linear payoff, probability of success for a non-linear payoff can be calculated using a two tailed test.
Suggested reading : http://en.wikipedia.org/wiki/Normal_distribution


Metastock formula for automating the above example:
x:= Input("No of trading days to maturity",1,10000,25);
y:= Input("% to Breakeven",-1000,1000,0);
a:= ROC(C,x,%);
b:= ExtFml( "JRS_MSX.Stdev",a,Cum(1) - x,1);
d:= ExtFml( "JRS_MSX.MOV",a,Cum(1) - x,S);
f:= LastValue(d);
g:= LastValue(b);
i:= (y - f)/g;
i

Required download: JRS.MSX.dll (or forum20.dll) available in the metastock forum.