# Nassim N. Taleb on the signal to noise ratio (and why you shouldn't read the news)

Nassim Taleb has made the point that by sampling an information source very frequently you will end up seeing more noise than signal. The purpose of this page is to show how to reproduce his results.

We start with the magnitude of the signal for given interval of time.
We assume that the amount of signal is a constant divided by the
number of time intervals over which it is observed, so the amount of
signal will shrink linearly as we move to shorter time scales. We also
assume the variance of the noise scales inversely with time, so the
standard deviation scales inversely with the *square root* of time. We
use the standard deviation because it has the same units as the
signal; the variance is in squared units. (There is a bit more here:
Signal-to-noise ratio - Wikipedia.)

## 1. Fooled by Randomness

In *Fooled by Randomness* (2nd edition, p 65), Taleb writes:

A 15% return with a 10% volatility (or uncertainty) per annum translates into a 93% probability of success in any given year. But seen at a narrow time scale, this translates into a mere 50.02% probability of success over any given second as shown in [his] Table 3.1.

He also writes (2nd edition, pp 66-67):

Viewing it from another angle, if we take the ratio of noise to what we call nonnoise (i.e., left column/right column [of his Table 3.1]), which we have the privilege here of examining quantitatively, then we have the following. Over one year we observe roughly 0.7 parts noise for every one part performance. Over one month, we observe roughly 2.32 parts noise for every one part performance. Over one hour, 30 parts noise for every one part performance, and over one second, 1,796 parts noise for every one part performance.

His results are shown in the following R script. The vector `time`

contains the number of units into which one year is divided. The 15%
return is broken into 15%/4 each quarter, while the standard deviation
over the same interval is divided by the square root of the number of
quarters. The variance should be divided by the number of intervals,
and the standard deviation is the square root of the variance. We then
determine when a signal with that mean and standard deviation would
show positive returns.

mean <- 15 # 15% return, the signal sd <- 10 # 10% error rate per annum, the standard deviation of the noise days_per_year <- 260 # number of trading days per year hours_per_day <- 8 # number of trading hours per day time <- c(1, 4, 12, days_per_year, days_per_year * hours_per_day, days_per_year * hours_per_day * 60, days_per_year * hours_per_day * 60 * 60) label <- c("year", "quarter", "month", "day", "hour", "minute", "second") taleb <- c(0.7, NA, 2.32, NA, 30, NA, 1796) # his numbers signal <- mean / time noise <- sd / sqrt(time) ## what fraction of distribution is > 0? probability <- pnorm(0, mean=mean / time, sd=sd / sqrt(time), lower.tail = FALSE) probability_pct <- signif(100 * probability, digits = 4) data.frame(row.names = label, year_fraction = time, probability_pct = probability_pct, signal = signif(signal, digits = 4), noise = signif(noise, digits = 4), noise_per_signal = signif(noise / signal, digits = 4), taleb_noise_per_signal = taleb)

year_fraction | probability_pct | signal | noise | noise_per_signal | taleb_noise_per_signal | |
---|---|---|---|---|---|---|

year | 1 | 93.32 | 15 | 10 | 0.6667 | 0.7 |

quarter | 4 | 77.34 | 3.75 | 5 | 1.333 | |

month | 12 | 66.75 | 1.25 | 2.887 | 2.309 | 2.32 |

day | 260 | 53.71 | 0.05769 | 0.6202 | 10.75 | |

hour | 2080 | 51.31 | 0.007212 | 0.2193 | 30.4 | 30 |

minute | 124800 | 50.17 | 0.0001202 | 0.02831 | 235.5 | |

second | 7488000 | 50.02 | 2.003e-06 | 0.003654 | 1824 | 1796 |

## 2. Antifragility

Similarly, in *Antifragility* (p 126), he writes:

Assume further that for what you are observing, at a yearly frequency, the ratio of signal to noise is about one to one (half noise, half signal)—this means that about half the changes are real improvements or degradations, the other half come from randomness. This ratio is what you get from yearly observations. But if you look at the very same data on a daily basis, the composition would change to 95 percent noise, 5 percent signal. And if you observe data on an hourly basis, as people immersed in the news and market price variations do, the split becomes 99.5 percent noise to 0.5 percent signal.

time <- c(1, 365, 365 * 24) label <- c("year", "day", "hour") signal <- 1 / time noise <- 1 / sqrt(time) data.frame(row.names = label, year_fraction = time, signal_pct = signif(100 * signal / (signal + noise), digits = 4), noise_pct = signif(100 * noise / (signal + noise), digits = 4))

year_fraction | signal_pct | noise_pct | |
---|---|---|---|

year | 1 | 50 | 50 |

day | 365 | 4.974 | 95.03 |

hour | 8760 | 1.057 | 98.94 |

## 3. Many thanks

- Alex Aronovich pointed out a gross error that I fixed long ago but then forgot to publish.
- Steven Vandekerckhove pointed out the proper number of trading days per year and trading hours per day, which made the results align better with Taleb's. He also made several suggestions that greatly improved the presentation.