Volcanic activity detection and noise characterization using machine learning¶

Author(s)¶

Author1 = {“name”: “Myles Mason”, “affiliation”: “Virginia Tech”, “email”: “mylesm18@vt.edu”, “orcid”: “0000-0002-8811-8294”}
Author2 = {“name”: “John Wenskovitch”, “affiliation”: “Virginia Tech”, “email”: “jw87@vt.edu”, “orcid”: “0000-0002-0573-6442”}
Author3 = {“name”: “D. Sarah Stamps”, “affiliation”: “Virginia Tech”, “email”: “dstamps@vt.edu”,”orcid”: “0000-0002-3531-1752”}
Author4 = {“name”: “Joshua Robert Jones”, “affiliation”: “Virginia Tech”, “email”: “joshj55@vt.edu”, “orcid”: “0000-0002-6078-4287”}
Author5 = {“name”: “Mike Dye”, “affiliation”: “Unaffiliated”, “email”: “mike@mikedye.com”, “orcid”: ” 0000-0003-2065-870X”}

Purpose¶

This Jupyter notebook explores methods towards characterizing noise and eventually predicting volcanic activity for Ol Doinyo Lengai (an active volcano in Tanzania) with machine learning using the TZVOLCANO CHORDS portal. Machine learning is a powerful tool that enables the automatization of complex mathematical and analytical models. In this Jupyter notebook, the components are time, height, latitude, and longitude. The predicted component values are the following heights. This project uses Global Navigation Satellite System (GNSS) data from the EarthCube CHORDS portal TZVOLCANO (Stamps et al. 2016; Daniels et al., 2016; Kerkez et al., 2016), which is the online interface for obtaining open-access real-time positioning data collected around Ol Doinyo Lengai. The bulk of the project is the exploration of the data and later prediction of height points. The station that this project analyzes is OLO1 for days 12/16/2020 and 04/16/2021.

Technical contributions¶

The training of the models and analysis uses basic linear algebra and statistics
The main libraries used (NumPy and pandas) are both libraries for data manipulation and linear algebra
The TZVOLCANO CHORDS portal linked above is the location of the data
Implementation of Linear Regression for prediction on time-series data

Methodology¶

The desired data was imported and selected with a range of about five months. The user downloads data from the TZVOLCANO CHORDS portal as a geojson file (.JSON). Previously downloaded JSON’s are converted to a Dataframe for easy manipulation further in the project. The information is then visualized for better analysis and statistical metrics running. Sample data size is increased for the project; A second JSON is introduced at the beginning of the data processing and analysis, which will later be used for OLO1 to predict height data from OLO1 on 4/16/2021 using 12/16/2020. We set up four series objects from the original data frame to be used for inputs for machine learning algorithms. The data set up for a linear regression shows the difference in target vs. predicted data in scatter plot form. Finally, nine predictions are displayed graphically with adjusting the test size data to demonstrate the different day prediction positions.

Results¶

This notebook explored predicting height data from the TZVOLCANO CHORDS portal using Linear Regression from different days. It also evaluates how much test data is needed to best predict height data. We find that having 10% test data yields the best results for predictions with the Mean Squared Error of 8.325e-5% . For predictions from data inputted and predicted from a single day we find the 75% test data yields the best results with an average error of -1.074e-4 meters.

Funding¶

Award1 = {“agency”: “National Science Foundation EarthCube Program”, “award_code”: “1639554”, “award_URL”: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1639554&HistoricalAwards=false }
Award2 = {“agency”: “Virginia Tech Academy of Integrated Sciences Hamlett Undergraduate Research Award”, “award_code”: “44672”, “award_URL”: “”}

Keywords¶

keywords=[“tzDF”, “Linear Regression”, “Concat”, “Transpose”,”Mean Squared Error(MSE)”]

Citation¶

Mason, Myles, John Wenskovitch, D. Sarah Stamps, Joshua Robert Jones, Mike Dye (2021), EC_01_Volcanic_activity_detection_and_noise_characterization_using_machine learning, EarthCube Annual Meeting.

Suggested next steps¶

Noise activity in the data set is a topic that should be explored. Noise exploration is crucial to the development of this project because understanding the noise and being able to filter and characterize noise will allow any day to be analyzed efficiently and for more reliable predictions of hazardous volcanic deformation. Future next steps would include applying more classifying algorithms such as DBSCAN to characterize the noise.

Acknowledgements¶

Virginia Tech Department of Geosciences
Alice and Luther Hamlet

	time	test	measurements_lat	measurements_height	measurements_lon
0	2020-12-16T05:08:18Z	false	2.734205	988.203	35.950217
1	2020-12-16T05:08:19Z	false	2.734205	988.206	35.950217
2	2020-12-16T05:08:20Z	false	2.734205	988.214	35.950217
3	2020-12-16T05:08:21Z	false	2.734205	988.226	35.950217
4	2020-12-16T05:24:41Z	false	2.734205	988.199	35.950217
...	...	...	...	...	...
5985	2020-12-17T04:48:44Z	false	2.734205	988.139	35.950217
5986	2020-12-17T04:53:51Z	false	2.734205	988.147	35.950217
5987	2020-12-17T04:53:52Z	false	2.734205	988.132	35.950217
5988	2020-12-17T04:55:09Z	false	2.734205	988.137	35.950217
5989	2020-12-17T04:59:54Z	false	2.734205	988.147	35.950217

	history_1	history_2	history_3	history_4	history_5	history_6	history_7	history_8	history_9	history_10	history_11	history_12	history_13	history_14	history_15	history_16	history_17	history_18	history_19	history_20
0	988.206	988.214	988.226	988.199	988.220	988.212	988.202	988.167	988.167	988.126	988.129	988.132	988.151	988.135	988.108	988.114	988.129	988.169	988.162	988.147
1	988.214	988.226	988.199	988.220	988.212	988.202	988.167	988.167	988.126	988.129	988.132	988.151	988.135	988.108	988.114	988.129	988.169	988.162	988.147	988.143
2	988.226	988.199	988.220	988.212	988.202	988.167	988.167	988.126	988.129	988.132	988.151	988.135	988.108	988.114	988.129	988.169	988.162	988.147	988.143	988.146
3	988.199	988.220	988.212	988.202	988.167	988.167	988.126	988.129	988.132	988.151	988.135	988.108	988.114	988.129	988.169	988.162	988.147	988.143	988.146	988.142
4	988.220	988.212	988.202	988.167	988.167	988.126	988.129	988.132	988.151	988.135	988.108	988.114	988.129	988.169	988.162	988.147	988.143	988.146	988.142	988.136
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
995	988.100	988.106	988.131	988.121	988.129	988.129	988.146	988.148	988.131	988.130	988.145	988.112	988.127	988.140	988.114	988.122	988.151	988.164	988.140	988.143
996	988.106	988.131	988.121	988.129	988.129	988.146	988.148	988.131	988.130	988.145	988.112	988.127	988.140	988.114	988.122	988.151	988.164	988.140	988.143	988.146
997	988.131	988.121	988.129	988.129	988.146	988.148	988.131	988.130	988.145	988.112	988.127	988.140	988.114	988.122	988.151	988.164	988.140	988.143	988.146	988.132
998	988.121	988.129	988.129	988.146	988.148	988.131	988.130	988.145	988.112	988.127	988.140	988.114	988.122	988.151	988.164	988.140	988.143	988.146	988.132	988.133
999	988.129	988.129	988.146	988.148	988.131	988.130	988.145	988.112	988.127	988.140	988.114	988.122	988.151	988.164	988.140	988.143	988.146	988.132	988.133	988.117

EarthCube 2021 Call for Notebooks

Volcanic activity detection and noise characterization using machine learning¶

Author(s)¶

Purpose¶

Technical contributions¶

Methodology¶

Results¶

Funding¶

Keywords¶

Citation¶

Suggested next steps¶

Acknowledgements¶

Setup¶

Library import¶

Parameter definitions¶

Data import¶

Data processing and analysis¶

Visualization of basic statistics from measurements_height and linear regression¶

Initiate Linear Regression¶

Linear Regression from single day¶

Method to increase sample size for the model¶

Increased data points for linear regression¶

35% Test Data demonstration¶

55% Test Data demonstration¶

75% Test Data demonstration¶

Using one day’s data to predict a different day’s data¶

Prediction from 10% test size data¶

Prediction from 20% test size data¶

Prediction from 30% test size data¶

Prediction from 40% test size data¶

Prediction from 50% test size data¶

Prediction from 60% test size data¶

Prediction from 70% test size data¶

Prediction from 80% test size data¶

Prediction from 90% test size data¶

Results¶

References¶