Tuesday, December 09, 2014

Stock Price Volatility: Log-Normal or Highly Deterministic superimposed with AWGN?

Below is a quick write-up of results I originally found in 2005 and am finally writing down in 2014.  I want to get it up so I can show it to someone who was asking about it.  As such there is a lot of unexplained stuff.  Volatility is a measure of the width of the distribution of price ratios, in many figures below it is the second number, the one following "+/-" in the title.  Volatility used here is the same "volatility" used in describing stock price motions.  Anyway, for what its worth here it is.

The Black-Scholes formula for estimating the value of a stock option is rather elegant.  It estimates the value of the stock option by assuming a particular random distribution of future stock price movements, and averaging over all of these to come up with the current value of a stock option as an expectation value over a random variable.  A second way of deriving the Black-Scholes formula is to come up with a strategy for fully hedging an options position by buying and/or selling short stock, in which case the Black-Scholes price of the option is the price at which there are no arbitrage gains to be had by trading options against stock positions.

Underlying the calculation of the Black-Scholes price is the assumption that price variations are log-normally distributed.  That is to say, if P2 is the price of the stock at time t2, and P1 is the price of the stock at time t1 = t2 - dt, then for all different values of t2, the variable x = log( P2/P1 ) is a random variable with normal (or Gaussian) distribution.

In 2005 I analyzed stock prices for a real stock to see if they did fit a log-normal distribution after all.  I took the daily closing prices for a tech company stock for a 14 year period.
14 years of closing prices
Now I take the price data and find the ratio of the closing prices on sequential days.  I take the logarithm of those values and plot them in a histogram.
Price ratios for 1 day spacing between prices
The dashed line shows a "best fit" Gaussian, where the fit is found as a Gaussian with the same mean and standard deviation as the histogram, and with amplitude chosen to make the area under the Gaussian equal the total number of data points plotted.  The title shows the mean and standard deviation of the Gaussian, but normalized to annualized change rates.  So the actual mean of the above Gaussian is 32.1% divided by 252 trading days in a year, the actual standard deviation is 62.2% divided by sqrt(252 trading days in a year).  

Finally, we will show plots like this with a logarithmic y-axis so that tail behavior can be seen clearly.  The daily volatility plot above then looks like this:
Daily Volatility with y-axis on logarithmic scale
What looked like a great fit to the data on a linear scale is now seen to have really serious problems in its tails.  There are about 10 ratios on the low side on the left of the plot, and maybe 12 on the high side, on the right, where the underlying Gaussian distribution would have predicted a VERy low probability of seeing any events at all in the 14 years of data.  In particular, the log-normal prediction is that we had less than 1 in 1000 chance of seeing any log(PriceRatio)>0.2, but we actually see 3!  Even the "close-in" outliers are highly improbable price ratios.  And yet the central part of the price ratios distribution looks fit rather well by a Gaussian.  

Price Ratios with about 1 Month between Prices
With about 1 month in prices, we can see that the log of price ratios is bigger than with only 1 day between prices.  This makes sense, the price of a stock changes more in 1 month than in 1 day.  In fact, from the log-normal model of daily prices, we would expect the price changes over N days to be on average sqrt(N) higher than the price changes over 1 day.  And in fact, the annualized volatility shown in the title of the plot is about the same as for the 1 day.  The annualized volatility is found by taking the actual volatility and multiplying it by sqrt(N/252) where N is the number of days between price points and 252 is the number of trading days in a year.  So as long as we see that volatility number (the second number in the title) staying about the same, the price ratios are behaving as you would expect for log-normal variables uncorrelated on the daily scale.  
About half a year between prices
With about a half year between prices, we are seeing a very strong bunch of outliers on the positive price ratio side.  And one can imagine one sees a narrowing gaussian peak for "most" of the non-outlier points, and a separate set of high ratio points that are not part of the Gaussian distribution.

Price Ratios with about 1 year between prices
With one year between the prices, we are seeing significant deviation from Guassian distribution.  Our best fit Gaussian does NOT fit the "central peak" very well anymore.  The outliers on the right have increased the mean of log(price ratio) to a higher value than characterizes the central peak.  The outliers on the right are also dominating the standard deviation calculation so that the "best fit" Gaussian is now clearly too wide and clearly too far to the right compared to the "central peak" of the data.  At this point we would probably want to model the data as "some points fit a Gaussian, the rest do not."