By Joseph Wills
You may not have ever heard of Big Data, but you’ve definitely used it. The term “Big Data” refers to the application of massive sets of information to predict and analyze trends. A novel example is self-driving cars: data from individual Teslas with Autopilot or Google’s Waymo cars are added to a massive and quickly-expanding proprietary data set that’s analyzed to predict people’s future driving patterns. That is, of course, just one example; Big Data is becoming increasingly useful in every sector of the economy. It’s also rapidly becoming a new and powerful tool for scientific study. Researchers at the National Science Foundation (NSF) Expeditions Computing group have begun working on applying data-driven analysis to climate change -- the gradual, human driven deterioration of the planet’s ecosystems, accompanied by a rise in extreme weather events and mass extinction of animal species.
The precise extent of climate change’s reach has yet to be determined; the NSF posits that current numerically driven techniques are inherently imprecise and incapable of answering questions concerning socio-economic issues like water and food availability, human death rates, or biodiversity. The NSF group has taken to developing data-driven models to predict future trends in climate change.
In order to understand the human effects of climate change, models must work on a human scale. Scientists have to effectively downscale data from covering huge tracts of land to data that is immediately observable by humans. NSF has built an artificial intelligence (AI) called DeepSD that takes climate data and “downsizes” it to a higher resolution. To paraphrase, DeepSD takes a set of data that covers a large area and predicts what the data would be for a small chunk of that locale. They then train their AI by feeding it known precipitation data. By giving the AI two known corresponding sets of data, in this case, a large low resolution image of climate data, and smaller, higher resolution data that covers the same space -- it can learn to predict what the data would look like if only given one set of data. After training, the AI can be given a square of precipitation data 111 km by 111 km, and predict what it would appear like at a higher resolution. From that 111 square kilometer patch of land, DeepSD can predict data at a resolution of approximately 13 km by 13 km. In practice, this would theoretically allow an AI to predict climate data for Chittenden County from data at a resolution the size of Vermont.
Big claims must be accompanied by impressive accomplishments; the capability of DeepSD is determined by its performance compared to traditional downsizing models. When compared to four other predictive models, DeepSD starkly outperformed three of them in bias, error, and predictive ability. When compared to BCSD, a standard numerical analysis method, DeepSD performed comparably, having marginal differences in predictive skill, but a significant advantage in average error.
Just as AI shifted paradigms in transportation, such as with self driving cars, human-computer interfaces (ala personal assistants like Siri), and social media, DeepSD and its descendents will break open a new branch of scientific inquiry. DeepSD was trained in precipitation data, but it can just as easily be trained for a huge variety of climate variables. Climate change is the biggest ecological challenge facing humanity; DeepSD shows how the data available now can be downsized to a more recognizable scale. And once humanity starts recognizing climate change fully, we can work to end it. In a nutshell, DeepSD is the lens through which large data sets can be boiled down to what’s relevant for humans.
References:
Vandal, T., Kodra, E., Ganguly, S., Michaelis, A., Nemani, R., & Ganguly, A. R. (2017).
DeepSD. Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining - KDD 17. doi:10.1145/3097983.3098004
Thaanks great blog post