Admittedly, I’m skeptical about applying the “wisdom of the crowds” approach to economic and market analysis.
Some famous versions of crowd “wisdom” tests revolve around tasks that virtually nobody has a claim on being an expert at performing. For instance, one version of a crowd sourcing test entails having people guesstimate the number of M&Ms in a jar. Of course, there are no professional M&M “jar guessers”, and because tiny sample sizes run the risk of over-representing people like the LA Lakers fan who, earlier this year, during a “challenge” sponsored by Pechanga Resort & Casino, said “lower” when asked to predict whether the next card shown on the screen would be higher or lower than the previous card, which was zero, it’s reasonable to assume that you’ve got everything to gain and nothing to lose from using a crowd-sourced approach where the “crowd” is some semblance of large.
But when what’s being predicted is amenable to expertise, the benefits of employing a crowd-sourced approach are less clear. There is little chance, for instance, of getting a more accurate “guesstimate” of the distance, in light years, from Earth to a newly-discovered planet from 10,000 random people than from 10 astronomers.
The line between when to employ crowd sourcing and when to rely on experts isn’t bright. Rather, this can be thought of as a continuum, where “How many M&Ms do you think are in this jar?” is on one end and questions like “How long do you think a lung cancer patient will live under various treatment regimes versus no treatment?” resides on the other.
As regular readers know, I’ve got some fairly extensive experience in post-graduate study both in political science (definitely a “soft” science”) and economics. PhDs will tell you economics is more “hard” science than “soft”, but that isn’t my experience. In fact, I could point you to a multitude of peer-reviewed academic studies in political science that are far more trenchant than some oft-cited economic papers. But none of that means predicting political and economic outcomes can be summarily dismissed as akin to M&M jar-guessing.
The financial blogosphere and, in some cases, the mainstream financial media, are notorious for maligning the “poor” track record of economists and, in the same vein, for lampooning the extent to which Wall Street often gets it wrong when it comes to predicting things like year-end levels for various assets. The implication is usually that the ubiquitous “monkey throwing darts” could produce results that, on average, would be just as good.
Of course that kind of aspersion casting is dishonest in the extreme. For one thing, it often assumes, if only implicitly, that people other than economists and “professionals” sometimes get it “right” simply by “fading” the professional forecasts. Even if that’s true, it doesn’t say anything at all about the abilities of the non-professionals. If the weatherman says it’s going to be sunny tomorrow and I bring an umbrella to the beach solely to spite him, it doesn’t make me “good” at forecasting the weather if it rains. I wasn’t “right” in any real sense, he was just wrong.
Importantly, I have no accountability in that scenario. If I bring an umbrella to the beach and it doesn’t rain, nobody notices, let alone cares. There is no risk to my credibility as an amateur weather forecaster, just like there is no risk to a blogger who, upon hearing that a major bank recommended a long yen expression, jumps on Twitter and says something like “yen to weaken imminently.” That hypothetical blogger will look witty if the bank’s call flounders, but if the bank is correct, nobody will remember that tweet and the ultimate irony is that if anyone did, the blogger would invariably cite their own non-expertise as an excuse (“Oh, give me a break — it was a joke and in any case, I’m not an FX strategist”).
When it comes to economics, the waters are muddied further by the fact that central banks’ forecasts are measured against and otherwise compared to “private sector” forecasts. But “private sector” usually still means “professional” in that the forecasts are from people with similar training as the central bankers. So, other than chosen profession (public servant versus private sector), it’s not clear whether it makes sense to set the comparison up as a juxtaposition between “official” forecasts and “crowd sourcing”. The “crowd” isn’t really a “crowd” if it’s comprised of professionals. That’s not generally what “crowd-sourced” means.
All of that to set up a brand new Goldman note out Saturday evening called “Can Fed Forecasters Really Beat the Wisdom of Crowds?”, the aim of which is described as follows:
History suggests reason to pay close attention to Fed forecasts, with a well-known academic study showing that Fed staff forecasts outperform private sector consensus forecasts. In this Analyst, we assess how the Fed’s relative forecasting performance has changed over the years, and discuss implications for the outlook today.
The bank sets the stage by reminding you that the March Fed minutes show Fed staff forecasts are now markedly different from their own projections. Here are the visuals in that regard:
Goldman also reminds you that studies show Fed forecasts are generally more accurate than private sector projections, not less. “In a well-known study, economists Christina Romer and David Romer found that Fed staff forecasts of growth and inflation significantly outperformed private sector consensus forecasts from 1980-1991, and a recent San Francisco Fed study reported similar findings over the period 1980-2013”, the bank notes.
To assess how this dynamic has evolved, Goldman looks at the performance of Fed staff forecasts and consensus forecasts from professional forecasters on growth and CPI inflation over the next four quarters, as well as the average unemployment rate in four quarters.
The bottom line: The Fed’s performance edge is waning. To wit:
Comparing the relative forecasting error of the consensus SPF forecast and the Fed staff forecast, as well as the percentage of times the Fed staff has been on the right side of consensus, we find that the Fed’s outperformance has declined. While the Fed outperformed consensus across both criteria and all three variables in the period analyzed in the original Romer and Romer (RR) study, the Fed staff has somewhat underperformed since, with the exception of generally being on the right side of consensus CPI forecasts.
What accounts for this? Well, for one thing, there are more private sector forecasters in the sample now. Specifically, the SPF now includes an average of ~40 forecasters versus 25 during the 1980-1991 sample.
On the surface, that supports the idea that crowd sourcing “works”. “More forecasters may lead to better consensus estimates through a ‘wisdom of the crowd’ effect and by providing a more accurate representation of the private sector’s best estimate”, Goldman writes.
But there’s a problem with that. The SPF isn’t really “the crowd”. Recent surveys, for instance, included forecasts from the likes of Credit Suisse’s James Sweeney, Deutsche’s Peter Hooper and, of course, Goldman’s own Jan Hatzius, who helped write the very note in question. This ain’t exactly 10,000 random people counting M&Ms.
Goldman also suggests a sharp drop in the number of forecasters submitting “outlier” (read: crazy) projections could account for more accurate private sector forecasts.
“We define ‘noise forecasters’ as forecasters with growth forecast errors of over 1pp over 85% of the time across all of their SPF submissions, and find that the share of such forecasters has declined from over 30% to roughly zero today”, Goldman says, adding that these folks generally don’t stay in the SPF very long.
Finally, Goldman observes that private forecasters are simply getting better at their jobs. Specifically, the bank notes that “while the Fed beat roughly 80% of forecasters over 50% of the time from 1980-1991, most forecasters have roughly even performance against the Fed in the period since, with the median forecaster beating the Fed about 50% of the time.”
Rolling it all up, the bank concludes as follows:
The combination of both a higher quantity and higher quality of forecasters has made it considerably more difficult for any forecaster— including the Fed staff—to beat the ‘wisdom of the crowd’ captured by the consensus median.
Before getting to the market takeaways from this, I would gently suggest that some of what Goldman observes here could be at least partially explained by a moderation of macro volatility. After all, it’s easier to make accurate predictions about a less volatile series, and that could also account for the demise of “noise forecasters”. Additionally, if you assume that the Fed is itself responsible for “smoothing” things out over the last three decades, then the “success” of private forecasters could ironically be attributable in part to the efforts of their “competitors”.
From a market perspective, what the above suggests is that the Fed’s informational advantage has waned, and therefore hawkish/dovish “surprises” are more likely to signal an intent to influence outcomes as opposed to being reflective of the central bank “knowing something” that everyone else doesn’t know. Goldman puts it as follows:
If the Fed has an informational advantage relative to the private sector, Fed policy decisions might reveal new information about the state of the economy. But we find that as the Fed’s relative forecasting performance has declined, economic forecasters and financial markets appear to have increasingly interpreted monetary surprises as pure monetary shocks rather than as new information about the economy.
The implication in the current environment is that the Fed’s recent dovish lean is more about providing extra support for growth in the interest of prolonging the cycle, and less about Powell tacitly “admitting” that the Fed has non-public information which to them suggests the outlook is worse than anyone thinks.
More broadly, this raises a number of familiar questions about crowd sourcing, especially as it relates to topics that don’t neatly fall on one end of the continuum mentioned above. Some of those questions are:
- Who is “the crowd”, exactly?
- Is it really “crowd sourcing” when you’re required to be a “professional” to participate?
- Even if you don’t have to be, strictly speaking, “a professional” to be counted, is a setup that imposes any restrictions whatsoever (i.e., “gatekeepers”) on who can weigh in properly “crowd sourcing”?
- Related to 3), if such restrictions are necessary to avoid capturing too many LA Lakers fans (see above) when anything more serious than M&M counting is at stake, then what is the threshold beyond which those restrictions make the whole exercise so exclusive, that it no longer makes sense to call it “crowd sourcing”?
- In the same vein as 4), if exclusivity is required when anything of any consequence is at stake, then what is the practical utility of pure crowd sourcing in the first place?
- Finally, what happens in a scenario where one group of people (in the above case, the Fed) has the ability to influence the outcome that both groups are trying to predict? And further, what happens if “success” when it comes to influencing that outcome works to the benefit of “the crowd” (as it appears to in the case of private forecasters and the Fed)?