A Generative, Multiresolution Model for Quality Control in Ecological Data
Contemporary environmental science is increasingly reliant upon networks of distributed automated sensors in remote locations. Decreased cost and improved portability of these sensors have allowed researchers to monitor landscapes at very fine spatial and temporal granularities. An instrumented research site may generate dozens to hundreds of near-continuous data streams of environmental measurements. However, in-situ sensors are often subject to harsh conditions that can lead to malfunctions in individual sensors and failures in network communications. Quality control (QC) is essential to identify incorrect measurements before these data can be assimilated in models and analyses. However, the abundance of data makes manual inspection by domain experts impractical and delays the release of data.
In this poster, we describe a generative modeling approach to automated QC. A probabilistic framework is provided that allows us to maintain a distribution over the functioning state of a sensor and the true value of the monitored phenomena. This framework facilitates real-time QC wherein we simultaneously diagnose the working state of the sensor and infer a distribution over its current reading. We explore machine learning techniques for learning the joint relationship among different types of sensors at a monitoring site. Our model is evaluated using three meteorological stations deployed in the H.J. Andrews Forest, a Long-term Ecological Research (LTER) site in western Oregon. We compare our results to existing single and multiple-sensor QC models.