On the Data Science team at Uptake, we specialize in knowing the best data to use in order to track and monitor the health of different machines and equipment, and just as importantly, we know the best methods to use to analyze that data.
This week I’m speaking at the ICSA/Graybill Applied Statistics Symposium in Fort Collins, Colorado. Held every year since 2003, it’s an important international gathering where researchers at the front lines of statistics meet to communicate new ideas and advances in the field. I’ll be presenting a new method in spatial statistics, which is a rapidly developing field that involves building models for data that occurs in space and time.
The practice of spatial statistics has widely expanded because of the recent explosion of data collection, an increased interest and adoption of Machine Learning methods and the ever-improving quality of computers. At the University of Chicago, my PhD dissertation involved deep mathematical analysis of the core concepts of spatial statistics and advanced applications of new methods to several data sets and scenarios.
What does this research have to do with Uptake and the predictive analytics software we build for major industries?
Quite a lot. Weather and environmental conditions greatly affect how industrial machines operate. An engine fan pushes air differently based on the levels of atmospheric pressure. Driving a truck on muddy soil wears tires differently than on dry dirt. A vehicle manager can tell with near certainty that snow is present on a route based solely on the behavior of the sensors onboard a particular vehicle.
A Case Study
We take these environmental factors into account in order to produce the most detailed analytics and deliver information to operators and managers of high value productive assets to help them make more informed, empowered decisions. In some cases this is straightforward—ambient temperature readings on different vehicles are often readily available—but in many cases it can be quite difficult.
Take for instance snowfall, or more specifically, snow on the ground. This affects the travel times of vehicles, and heavy snowfall, of course, stops some travel altogether. From a mechanical perspective, parts on a machine may operate differently in the presence of snow, and knowing the extent of snow on the ground helps managers plan routes and provide the proper interpretations of data coming off machines. But this kind of precise knowledge can be difficult to come by.
The power of spatial statistics, and the power of predictive modeling in general, doesn’t stop with weather prediction.
The National Oceanic and Atmospheric Administration (NOAA) collects some of the most up-to-date information on snow depth in the United States. A number of weather stations across the country collect data, and in some locations, the results are updated once daily. These datasets represent the tremendous efforts of the NOAA—not to mention numerous volunteers—to collect this information. But like all weather station data of this nature, one big problem is that these observations are made at a necessarily limited number of locations: you can’t have a weather station (and a volunteer to operate it) everywhere.
For a case study, I’ll return to my snowfall example. From late January through February of 2015, Boston and much of Massachusetts had record-breaking snowfall, and weather stations across the state captured snow depth data during that time. The locations of these are shown in the following plot. As you can see, although there are several sites, there are large empty spaces between them where no snow depth information is collected:
By plotting snow depth through time at one weather station near Boston, we can quickly identify the start of the winter onslaught. However, at other weather stations in Massachusetts, there is little to no substantial change:
Given the variability in snowfall across Massachusetts at this time, to obtain a complete snow depth map, the challenge here is to interpolate the observed data to unobserved locations. This is the specialty of spatial statistics, and to solve this problem, we can draw from many of the sophisticated methods that have been developed in this discipline.
Before jumping to more complicated measures, it’s important to look at simpler methods for comparison. Simple averaging techniques are one attractive choice because of their intuitiveness. Unfortunately, many of these methods are inadequate to make an overall estimation. Here is one such example: a map of estimated snow depth values for January 31, 2015, where estimates of snow depth are weighted averages of the nearby observations. In this case, the weights are based on inverse spatial distances.
We can immediately see that these estimates of snowfall are unrealistic. In cases like this, simple averaging techniques suffer from various mathematical side effects. For example, the locations of maxima and minima in the above plot correspond directly to the weather station locations. This phenomenon, of course, does not correspond to a physical reality.
Without adding any more data to this scenario, we can make some positive steps by using more scientific—though also more complicated—spatial models:
Visually, this map gives a more convincing representation of snow depth. The locations of the weather stations are much less prominent in this map (though several can still be identified), and overall, the features in this image are more complex. Most importantly, prediction of snow depth using this model is more accurate. When we have better measures of snow depth, we can better account for travel times and how machinery will be affected.
Real Problems, Real Solutions, In Real Time
This spatial model is far from perfect. By adding more data and data sources to this estimation—for example, temperature data—we can find better measures of snow depth. At Uptake, however, it’s quite clear how useful these methods are for our goal of best accounting for the environmental impact on the machines we help monitor.
The power of spatial statistics, and the power of predictive modeling in general, doesn’t stop with weather prediction. A component that fails on a machine will damage other components, depending on its spatial location. Temperature in the hottest part of an engine—which indicates wear—isn’t always measured directly, but it can be inferred based on bordering temperature sensors and the kind of interpolation methods used in spatial statistics. Tracking sensor movements through time requires approaches similar to spatial methods.
At Uptake, we have assembled an elite team of technologists, data scientists, user experience experts and more, and we are building the most sophisticated platform to provide actionable insights in major industries. The breadth and depth of our data science team is allowing us to stop at nothing to create the most complete and actionable models addressed at our target industries. Put more simply, we are seeing opportunities in this data that have never before been seen. And as Uptake grows, our predictive models learn and grow—with time, we become even more valuable to our customers. We are at an inflection point in major industries where predictive insight has become an extremely valuable tool that will differentiate great companies from the rest. Our data science is at the center of this inflection—and just like data isn’t just numbers in a database, data science isn’t just a practice secluded away from the world. Uptake is solving real problems, and creating real solutions, in real time.