What’s The Fed Thinking? Ask ChatGPT

Notwithstanding terrifying warnings+ about “nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us,” I hope the generative AI craze isn’t a fad.

Over the past two or three months, I’ve devoted quite a bit of coverage to ChatGPT and related developments, and AI features prominently in April’s forthcoming monthly letter (not as an author). The last time I ventured out on a limb in the interest of expanding my horizons, it (the limb) nearly snapped before I had time to scurry back and grab ahold of the tree. Regular readers will fondly recall that I exited substantially all of my “investments” (read: speculative forays) in Web3 a mere six weeks before the Terra Luna debacle tipped the first domino on what, by the end of last year, was a $2 trillion wipeout across the cryptoverse.

I wouldn’t call my adventures in the metaverse a waste of time. I learned a lot, I shared everything I learned with readers and I only lost ~10% or so for my trouble (it’s more like 20% if you include mark-to-market losses on the Bitcoin and Ether I still hold and around 25% if you assume my handful of NFTs are worth nothing by virtue of being completely illiquid). But I’d rather not repeat the experience. It’s not the 10% (or even the 20% or 25%), it’s the inescapable suspicion that what I learned wasn’t worth the time I spent to learn it. You can always recoup lost money. You can never recoup lost time.

I don’t think the generative AI frenzy will end like the metaverse mania, though. Indeed, I don’t think it’s going to end at all, unless by “end” you mean “end” a lot of human jobs. My rationale for covering it centers entirely on my suspicion that we’re witnessing the dawn of a new technological era. Of course, for about a month last year, I didn’t think decentralized finance was going away either. It hasn’t, but it almost has.

In any case, that’s a circuitous way of introducing a few passages from a recent paper called “Can ChatGPT Decipher Fedspeak?” by Anne Lundgaard Hansen and Sophia Kazinnik, Richmond Fed researchers with PhDs.

I talk a lot about “Fed tasseography” — the art of reading between the lines of policy statements, FOMC meeting minutes and policyspeak. Most people are surprisingly bad at it, but that’s ok because, as it turns out, ChatGPT is already quite adept.

I don’t see a lot of utility in editorializing around the paper. The excerpts found below speak for themselves, and even if they didn’t, you could just ask ChatGPT to write you a summary or, alternatively, you could read the very first “sentence,” which finds Lundgaard Hansen and Kazinnik answering the question they posed in their title: “Yes!”

Below find a few highlights from the 17-page study which, if you run out of things to do on Wednesday, you can read in its entirety here.

Geerling et al. (2023) show that ChatGPT performs well in the Test of Understanding in College Economics (TUCE), a standardized test of economics knowledge, answering 86.7% of the macroeconomics questions correctly. One can therefore think of ChatGPT as a virtual research assistant, potentially qualified for tasks such as classifying central bank communication texts. In principle, GPT models have enough domain knowledge to label economic texts correctly. However, they may not have the same level of nuance and context-awareness as a human research assistant. The technology can, therefore, be either hugely time and resource saving or it can result in misleading or wrong conclusions.

GPT models have the ability to explain why a certain sentence was labeled [dovish, mostly dovish, neutral, mostly hawkish or hawkish], a capability beyond any existing NLP model and a valuable feature for researchers. We test this capability in a short exercise using ChatGPT and the underlying GPT-3 and GPT-4 models. We find that GPT’s reasoning successfully justifies its classifications, and furthermore is very similar to the reasoning provided by a human reviewer. GPT-4 offers an improvement over GPT-3 with more cases of agreement with the human classifications and explanations. The newest version of GPT-4 is therefore likely to generate even stronger performance metrics than those of GPT-3 reported in this paper.

The analysis presented in this paper shows that GPT models demonstrate a strong performance in classifying Fedspeak sentences, especially when fine-tuned. However, it is important to note that despite its impressive performance, GPT-3 is not infallible. It may still misclassify sentences or fail to capture nuances that a human evaluator with domain expertise might capture. Thus, while GPT models may not be able to fully replace human evaluators, they can serve as a highly valuable tool for assisting researchers and analysts in this domain.

One final note: It’s possible that GPT is even more accurate than we’re currently giving it credit for. After all, humans get this wrong all the time, and determining “rightness” is often a matter of what Fed communications subsequently convey. Fedspeak will sometimes be an attempt to “walk back” the market reaction to previous communications, with the implication that the “human” interpretation was wrong. Note the scare quotes.

The fact that markets are already overrun by nonhumans adds another wrinkle — irony atop irony is the fact that we can’t even describe the market’s reaction to Fed communications as “human” because markets are dominated by algorithms. I’m not sure what the future looks like if it’s determined that GPT is indeed superior to humans when it comes to deciphering Fedspeak. Perhaps algorithmic trading models will soon be taking their instructions from ChatGPT.


 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

4 thoughts on “What’s The Fed Thinking? Ask ChatGPT

  1. With ChatGPT, we can now work multiple full-time jobs! I wrote on LinkedIn that from what I’ve already seen in this ‘dawn of AI’ largely underestimates just how disruptive this technology will be. You see people ‘bright siding’ AI most of the time, since being a gadfly, or pessimistic or cautious at all on LinkedIn can be a scarlet letter. That said, I’ve been using it for various things already (writing, coding, research) and can hardly live without it!
    https://www.vice.com/en/article/v7begx/overemployed-hustlers-exploit-chatgpt-to-take-on-even-more-full-time-jobs?utm_source=pocket-newtab

  2. I have been using it for about a month for coding. I am amazed. It got it wrong the code wrong about 60% of the time. But it is getting better I would say now it only gets it wrong 30% of the time. Even when it gets it wrong and I have to ask a couple more questions to get the answer correct I can still get the correct answer in 5-20 minutes vs going on the help sites and waiting sometimes days for a response.

    1. GPT4? Really? I’ve been trying to use that (and Claude+, and Bard, and YouChat) for coding help, and the results, after a LOT of prompt engineering and refinement, and have gotten about 1% usable code, 99% absolute unfixable stupidity.

NEWSROOM crewneck & prints