Toward autonomous detection of anomalous GNSS data via applied unsupervised artificial intelligence¶

Unsupervised Anomaly Detection of TZVOLCANO GNSS Data using Gaussian Mixtures. Once loaded in Binder, please run all the cells to properly initialize values and GUI elements.

Authors¶

Mike Dye, D. Sarah Stamps, Myles Mason

Author1 = {“name”: “Mike Dye”, “affiliation”: “Unaffiliated”, “email”: “mike@mikedye.com”, “orcid”: “0000-0003-2065-870X”}
Author2 = {“name”: “Dr. Sarah Stamps”, “affiliation”: “Virginia Tech”, “email”: “dstamps@vt.edu”, “orcid”: “0000-0002-3531-1752”}
Author3 = {“name”: “Myles Mason”, “affiliation”: “Virginia Tech”, “email”: “mylesm18@vt.edu”, “orcid”: “0000-0002-8811-8294”}

This notebook demonstrates a process by which GNSS data (lontitude, latitude, and height) obtained from the TZVOLCANO CHORDS portal (Stamps et al., 2016) can be analyzed with minimal human input to remove data points that are manifestations of high noise, instrumentation error, and other factors that introduce large errors into specific measurements. These prepared and cleaned data are then used to train a neural network that can be used for detecting volcanic activity.

This notebook takes advantage of the Earthcube funded CHORDS infrastructure (Daniels et al., 2016; Kerkez et al., 2016), which powers the TZVOLCANO CHORDS portal. GNSS positioning data (longitude, latitude, and height) are from the active Ol Doinyo Lengai volcano in Tanzania, which are made available through UNAVCO’s real-time GNSS data services. UNAVCO’s real-time GNSS data services provides real-time positions processed by the Trimble Pivot system. Real-time GNSS data from several instruments are streamed into the TZVOLCANO portal using brokering scripts developed by Joshua Robert Jones in Python and D. Sarah Stamps in awk, which makes them instantly available via the CHORDS data API service.

Technical contributions¶

Created a python-based API client to download data from a CHORDS portal
Development of local libraries to download, manipulate, and plot GNSS data (longitude, latitude, and height) obtained from a CHORDS portal that obtains positions from UNAVCO’s real-time GNSS data services
Identification and removal of statistical outliers in GNSS time-series data using the Gaussian Mixtures Algorithm
Identification and removal of statistical outliers in GNSS time-series data using the K-means Algorithm
Implementation of a neural net model which, when trained on these data, can make predictions based on the historical time series

Methodology¶

Select instrument and date range of positioning data (longitude, latitude, and height) to analyze
Download selected data set from TZVOLCANO CHORDS portal
Scale and impute data to prepare them for machine learning algorithms
Use a Gaussian Mixtures and then a K-means algorithm to identify and remove data points likely to have significant noise from each feature/variable
Train three Neural networks: one using the unfiltered data and two using the “cleaned” data output from the Gaussian mixtures and K-Means algorithm
Use predictions made by the these neural nets to make predictions (forecasts) of future data points
Compare these predictions to actual values from the unmodified data set to quantify the reduction in noise achieved by the filtering algorithm

Results¶

Compared to the neural net trained on the unfiltered data, filtered (or “cleaned”) data output Machine Learning Visualizations (Gaussian Mixtures and K-means) result in trained neural net models that do a significantly better job of generating predictions.

Funding¶

The development of this notebook was not directly supported by any awards, however the notebook leverages the EarthCube cyberinfrastructure CHORDS which was funded by the National Science Foundation.

Award1 = {“agency”: “US National Science Foundation”, “award_code”: “1440133”, “award_URL”: “https://www.nsf.gov/awardsearch/showAward?AWD_ID=1440133&HistoricalAwards=false”}
Award2 = {“agency”: “US National Science Foundation”, “award_code”: “1639750”, “award_URL”: “https://www.nsf.gov/awardsearch/showAward?AWD_ID=1639750&HistoricalAwards=false”}
Award3 = {“agency”: “US National Science Foundation”, “award_code”: “1639554”, “award_URL”: “https://www.nsf.gov/awardsearch/showAward?AWD_ID=1639554&HistoricalAwards=false”}

Keywords¶

keywords=[“TZVOLCANO”, “CHORDS”, “UNAVCO”, “Artificial Intelligence”, “Machine Learning”]

Citation¶

Dye, Mike, D. Sarah Stamps, Myles Mason (2021), Jupyter Notebook: Toward autonomous detection of anomalous GNSS data via applied unsupervised artificial intelligence, EarthCube Annual Meeting 2021

Suggested next steps¶

A Support Vector Machine should be investigated as a possible filtering mechanism.
CHORDS API should be made more robust and flexible.
Predictions from the improved trained neural net model should be compared in real-time to incoming GNSS data to attempt to identify emerging volcanic events.
Test if filtering data with both the Gaussian Mixtures and K-means in combination would further improve the neural net predictions.
Use this same filtering process on time-series data from other CHORDS portals.
Update citation with doi
Investigate and compare the approach used in this notebook with benchmarks from classical time series filtering and prediction *

* As suggested by anonymous reviewer

Acknowledgements¶

CHORDS: for providing a versatile and practical cyber-infrastructure component
Virginia Tech: for enabling an incredibly supportive cutting edge learning and research environment
EarthCube & Earthcube Office: for creating the opportunity to create and share notebook and creating a well-designed Jupyter notebook template
Abbi Devins-Suresh: for testing this notebook and invaluable feedback

License¶

This notebook is licensed under the MIT License.

Glossary¶

A brief definition of these terms is provided below for terms that may be unfamiliar to those without experience with machine learning, or are used in ways that may be unusual or ambiguous.

[Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning): “a individual property or characteristic of a phenomenon being observed” (Wikipedia contributors, 2021). In this notebook, the imported fields (Time, Height, Longitude, and Latitude) are the initial features. One additional feature is calculated on the fly - the vector magnitude of scaled values of the original fields.
Impute: In machine learning, the replacement of null or missing values with an actual value in order to facilitate processing by an algorithm.
Anomaly: Data that for varying reasons do not occur within the usual ranges. In this notebook, there are (at least) two types of anomalies that may occur: those due to inaccurate measurements and subsequent processing, and those due to actual volcanic events.