Over the course of the last three months, the world was reminded that when it comes to outbreaks (and especially epidemics and pandemics), it’s nearly impossible to “stay on top of things”, as it were.
Due to the myriad cognitive limitations inherent in being human, as well as the time it takes to mobilize scarce resources across vast distances, society is destined to be one (and likely many more than one) step behind in the fight to contain a highly contagious disease.
This is exacerbated in the early stages (i.e., prior to lockdowns, travel restrictions and the institution of social distancing measures) by the interconnectedness of a globalized world.
And yet, as authorities rush to catch up with COVID-19, we’ve somehow lost track of the fact that thanks to advances in big data analysis, machine learning and AI, humanity actually does have the capacity to run with (and perhaps even outrun) biological threats like the coronavirus.
Big data, machine learning and AI are buzzwords that resonate across industries. Investing is no exception. Indeed, some in corporate America have developed a veritable obsession with figuring out how best to harness and utilize big data and machine learning to better target consumers, enhance productivity and boost portfolio returns, just to name a few examples.
But we’ve seemingly lost track of this during the pandemic panic – it’s possible we’ve neglected to unleash the only weapon we have capable of staying ahead of the virus.
A few weeks back, JPMorgan’s Marko Kolanovic suggested that we should be doing more to tap into real-time data sets to help assess the evolution of the epidemic. He cited smart thermometers as an example.
Well, in a much longer piece, Kolanovic addresses the points made above and much, much more.
“A leading authority in epidemics recently stated: ‘If you think you are in line with the outbreak, you are already three weeks behind'”, he writes, before immediately noting that “in the age of big data, machine learning and internet of things, this could be different, as technologies exist to have a real-time picture of a pandemic”.
These technologies should be marshaled in order to allow authorities to not only be “in step” with the outbreak, but in fact ahead of it. That, Marko notes, would allow for the optimization of response times and resources, as well as the more effective evaluation of the measures already employed.
Kolanovic – whose ability to communicate in readily accessible language is unique among quants and PhDs – explains the problem in straightforward terms. To wit:
Why are traditional data always lagging the virus? When the first symptoms of an illness appears, most people wait some time to confirm the symptoms or wait in hope that initially mild symptoms go away. After several days they may seek medical help, but appointments or tests may not be immediately available. When virus tests are obtained, results may take several additional days to arrive. In all, from the first onset of symptoms to a confirmation of a disease, it may take 1 or 2 weeks, often even longer. If there is a pandemic outbreak, 1 or 2 weeks may mean that the virus is already spread out of control, before it even shows in official statistics.
So, in other words, it’s easily possible to have an epidemic (or at least the makings of one) before a single case is officially confirmed.
Arguably, that’s what happened with COVID-19, and while I would never want to say anything to minimize or otherwise downplay the human tragedy this outbreak has become, I think it’s crucial for society to at least consider the possibility that the “next one” (if you will) could be something far, far worse – some manner of viral hemorrhagic fever, for example.
In that scenario, staying ahead of it wouldn’t be so much a matter of saving individual lives. It would be a matter of saving humanity in general.
Not to put too fine a point on it, but if the following visual depicted, for example, confirmed Marburg cases, instead of COVID-19 infections, nobody would be talking about the proper timing for reopening economies.
With that in mind, we should take this opportunity to tap the technology at our disposal and create the type of big data-driven infrastructure that could one day allow us to fight a virus with the only thing that moves faster: the processing power of advanced computers.
Kolanovic implores the world to build this infrastructure – now.
“There is no reason why anonymized, aggregate information about onset of disease symptoms could not be available and tracked by artificial intelligence in real time”, he writes, adding that the “output of such a system would be used to prevent and contain pandemics, as well as manage re-opening of economies in the aftermath”.
This system – an “internet of living things” – could have multiple components, Marko suggests.
The first would be what he calls a “global pandemic daily roll call”. This could entail, for instance, a “required landing page on iOS/Android smart phone devices, or via text messages similar to emergency alerts” allowing users to note their status. “A minute of time would be a small ask during a pandemic”, Marko writes. Conceptually speaking, this would be no different from a census, really – and it would save lives.
Before I go further, you should note that he of course includes a caveat about the absolute necessity of protecting data from theft and keeping it out of the hands of nefarious actors.
Second, Kolanovic suggests governments give everyone internet-connected diagnostic tools – for free. This harkens back to the smart thermometer discussion from the linked post above, but it goes much further than that.
“Health authorities could obtain a real-time picture and be able to prevent and manage seasonal outbreaks, region-specific illnesses, or global pandemics”, Marko says, noting that this could even spawn new branches of economics.
Third, there should be a “big data pandemic war room”, where the massive volume of complex data would be processed an analyzed. “Once the data are processed, these war rooms of data scientists and health professionals would coordinate the response with appropriate authorities”, he says.
Fourth, it should be recognized that this effort shouldn’t necessarily be the sole purview of governments. After all, crowdsourcing is a powerful phenomenon which has contributed mightily to everything from web development (e.g., open-source software) to encyclopedic knowledge (e.g., Wikipedia).
“Millions of talented scholars and data scientists across the developed and developing world” should have access to “all the data relevant for the pandemic and remedies… on a central website in real time”, Marko says, before suggesting that the highly contentious Hydroxycholorquine debate may well have been solved by now if this approach were adopted. To wit, from Kolanovic:
There are several inconclusive and incompatible studies about the efficacy of this drug (e.g., there are 2 papers from French authors, 2 from Chinese, 1 from Korean, various other data points from governments, hospitals and even individuals, and a number of ongoing official trials). If these data were available in real time, there would likely be a machine learning/AI-driven model that could combine the different datasets to produce a probability model for success of this drug (e.g., integration of traditional and unstructured data, partial and ongoing trial results, etc.). Such a machine learning /AI algorithm would allow medical professionals to make potentially life-saving decisions (be it to use or not to use) ahead of slower traditional statistical studies that are employed during ‘peace time.’
Naturally, the kind of granular data analysis discussed above would allow for better assessments when it comes to who can travel, reengage with society and restart economic activity safely. As Marko puts it, “all of the ‘internet of living things’ data… could be processed by AI to recommend to an individual potential restrictions or removal of restrictions”.
This could be optimized not just by country, but down to the regional and even community level, providing for a kind of super-optimized restart of the economy whereby officials can quantify with remarkable specificity where and when to advise the reopening of businesses, etc.
Kolanovic doesn’t think we should wait around on this. “We believe the creation of an ‘internet of living things’ to address pandemic risk should be an immediate and an ongoing priority”, he writes.
He also extends the analysis from the Kinsa Health smart thermometers data (see the linked post above for more). “Extends” is probably an understatement. Marko spends roughly a thousand words explaining the methodology.
Essentially, he combines that data with official data from state officials and hospitals not so much to make any concrete projections, but rather to lead by example when it comes to employing the type of broad-based, holistic approach to epidemic analysis outlined above. He also takes the opportunity to highlight myriad challenges in this type of modeling.
The left pane in the two visuals below shows the incidence of influenza-like illness above or below what would be expected seasonally using the Kinsa data. On a national level, it continues to decline. You’ll recall that the data used in those charts cannot say, specifically, whether a given decline is due to COVID-19. As Kolanovic puts it, “at one extreme (optimistic), all of the decline is due to a COVID-19 decline, and at the other extreme (pessimistic) none of decline is due to COVID-19 [although] cold/flu decline is also important as it frees up hospital capacity”. For the model, he averages the extremes.
He also has to make some assumptions about recovery rates and timelines. For that, he turns to existing information (which suggests a two-week recovery period for mild cases and four weeks for severe infections), then fits “a smooth recovery curve for each day of atypical iLI”. That, in turn, produces a model output for daily positive COVID-19 cases.
Kolanovic emphasizes that the real number is obviously far greater than the number of confirmed cases, but for him, the more crucial metric is “hospital admissions/loads”, as that creates the “risk of a health care system breakdown”.
From there, he explains the specifics behind modeling hospitalization rates. Suffice to say he covers quite a few bases.
(JPMorgan, Kinsa data)
Here is Marko’s summary of the two visuals (and do keep in mind that all of this was based on data through the end of last month):
- Figure 1 shows the % iLI above/below expected seasonal. One can see a continued decline in atypical iLI nationally. In SF county, restrictive measures appear to be very successful and suppressed virtually all iLIs. In NY, iLI declined but is still significantly higher than the national level — as expected given how severe the NY outbreak is.
- Figure 2 shows a model of NYC COVID-19 cases and hospitalization rates based on atypical iLI data, model recovery and hospitalization rates. Modeling steps/assumptions introduce uncertainty for absolute number estimates. However, the main features indicate that the number of active cases likely peaked already, and that hospitalization rates are possibly peaking now.
That latter projection – i.e., that NYC hospitalization rates were possibly peaking – was from six days ago.
On Sunday, Andrew Cuomo said new coronavirus deaths fell to 594 from 630. Obviously, it’s too early to draw conclusions, but Cuomo did note that new hospitalizations, which were 1,095 on Friday, were cut in half, falling all the way to 574.