Nassim N. Taleb on the signal to noise ratio (and why you shouldn't read the news)

Nassim Taleb has made the point that by sampling an information source very frequently you will end up seeing more noise than signal. The purpose of this page is to show how to reproduce his results.

We start with the magnitude of the signal for given interval of time. We assume that the amount of signal is a constant divided by the number of time intervals over which it is observed, so the amount of signal will shrink linearly as we move to shorter time scales. We also assume the variance of the noise scales inversely with time, so the standard deviation scales inversely with the square root of time. We use the standard deviation because it has the same units as the signal; the variance is in squared units. (There is a bit more here: Signal-to-noise ratio - Wikipedia.)

1. Fooled by Randomness

In Fooled by Randomness (2nd edition, p 65), Taleb writes:

A 15% return with a 10% volatility (or uncertainty) per annum translates into a 93% probability of success in any given year. But seen at a narrow time scale, this translates into a mere 50.02% probability of success over any given second as shown in [his] Table 3.1.

He also writes (2nd edition, pp 66-67):

Viewing it from another angle, if we take the ratio of noise to what we call nonnoise (i.e., left column/right column [of his Table 3.1]), which we have the privilege here of examining quantitatively, then we have the following. Over one year we observe roughly 0.7 parts noise for every one part performance. Over one month, we observe roughly 2.32 parts noise for every one part performance. Over one hour, 30 parts noise for every one part performance, and over one second, 1,796 parts noise for every one part performance.

His results are shown in the following R script. The vector time contains the number of units into which one year is divided. The 15% return is broken into 15%/4 each quarter, while the standard deviation over the same interval is divided by the square root of the number of quarters. The variance should be divided by the number of intervals, and the standard deviation is the square root of the variance. We then determine when a signal with that mean and standard deviation would show positive returns.

mean <- 15                            # 15% return, the signal
sd <- 10                              # 10% error rate per annum, the standard deviation of the noise
days_per_year <- 260                  # number of trading days per year
hours_per_day <- 8                    # number of trading hours per day
time <- c(1, 4, 12, days_per_year, days_per_year * hours_per_day,
          days_per_year * hours_per_day * 60,
          days_per_year * hours_per_day * 60 * 60)
label <- c("year", "quarter", "month", "day", "hour", "minute", "second")
taleb <- c(0.7, NA, 2.32, NA, 30, NA, 1796) # his numbers
signal <- mean / time
noise <- sd / sqrt(time)

## what fraction of distribution is > 0?
probability <- pnorm(0, mean=mean / time, sd=sd / sqrt(time), lower.tail = FALSE)
probability_pct <- signif(100 * probability, digits = 4)
data.frame(row.names = label,
           year_fraction = time,
           probability_pct = probability_pct,
           signal = signif(signal, digits = 4),
           noise = signif(noise, digits = 4),
           noise_per_signal = signif(noise / signal, digits = 4),
           taleb_noise_per_signal = taleb)

	year_fraction	probability_pct	signal	noise	noise_per_signal	taleb_noise_per_signal
year	1	93.32	15	10	0.6667	0.7
quarter	4	77.34	3.75	5	1.333
month	12	66.75	1.25	2.887	2.309	2.32
day	260	53.71	0.05769	0.6202	10.75
hour	2080	51.31	0.007212	0.2193	30.4	30
minute	124800	50.17	0.0001202	0.02831	235.5
second	7488000	50.02	2.003e-06	0.003654	1824	1796

2. Antifragility

Similarly, in Antifragility (p 126), he writes:

Assume further that for what you are observing, at a yearly frequency, the ratio of signal to noise is about one to one (half noise, half signal)—this means that about half the changes are real improvements or degradations, the other half come from randomness. This ratio is what you get from yearly observations. But if you look at the very same data on a daily basis, the composition would change to 95 percent noise, 5 percent signal. And if you observe data on an hourly basis, as people immersed in the news and market price variations do, the split becomes 99.5 percent noise to 0.5 percent signal.

time <- c(1, 365, 365 * 24)
label <- c("year", "day", "hour")
signal <- 1 / time
noise <- 1 / sqrt(time)
data.frame(row.names = label,
           year_fraction = time,
           signal_pct = signif(100 * signal / (signal + noise), digits = 4),
           noise_pct = signif(100 * noise / (signal + noise), digits = 4))

	year_fraction	signal_pct	noise_pct
year	1	50	50
day	365	4.974	95.03
hour	8760	1.057	98.94

3. Many thanks

Alex Aronovich pointed out a gross error that I fixed long ago but then forgot to publish.
Steven Vandekerckhove pointed out the proper number of trading days per year and trading hours per day, which made the results align better with Taleb's. He also made several suggestions that greatly improved the presentation.

4. More reading

Others have published similar analyses: