Automated Machine Learning for Earth Science via AutoGluon


  • Author1 = {“name”: “Xingjian Shi”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author2 = {“name”: “Wen-ming Ye”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author3 = {“name”: “Nick Erickson”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author4 = {“name”: “Jonas Mueller”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author5 = {“name”: “Alexander Shirkov”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author6 = {“name”: “Zhi Zhang”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author7 = {“name”: “Mu Li”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}

  • Author8 = {“name”: “Alexander Smola”, “affiliation”: “Amazon Web Services”, “email”: “”, “orcid”: “”}


In this notebook, we introduce AutoGluon to the Earth science community. AutoGluon is an automated machine learning toolkit that enables users to solve machine learning problems with a single line of code. Many earth science problems involve tabular-like datasets. With AutoGluon, you can feed in the raw data table and specify the label column. AutoGluon will deliver a model that has reasonable performance in a short period of time. In addition, with AutoGluon, you can also analyze the importance of each feature column with a single line of code. In the following, we illustrate how to use AutoGluon to build machine learning models for two Earth Science problems.


We have pre-installed AutoGluon via pip. Here, we will fix the random seed.

# Uncomment below to install autogluon
# !python3 -m pip install autogluon
import random
import numpy as np
WARNING: You are using pip version 20.2.4; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
WARNING: You are using pip version 20.2.4; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.

Forest Cover Type Classification

In the first example, we will predict the forest cover type (the predominant kind of tree cover) from strictly cartographic variables. The dataset is downloaded from Kaggle Forest Cover Type Prediction. Study area of the dataset includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. The actual forest cover type for a given 30 x 30 meter cell was determined from US Forest Service (USFS) Region 2 Resource Information System data. Independent variables were then derived from data obtained from the US Geological Survey and USFS. The data is in raw form and contains binary columns of data for qualitative independent variables such as wilderness areas and soil type. Let’s first download the dataset.

!wget -O
!unzip -o -d forest-cover-type-prediction
--2021-05-16 08:43:12--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26555059 (25M) [application/zip]
Saving to: ‘’

forest-cover-type-p 100%[===================>]  25.32M  63.0MB/s    in 0.4s    

2021-05-16 08:43:13 (63.0 MB/s) - ‘’ saved [26555059/26555059]

  inflating: forest-cover-type-prediction/sampleSubmission.csv  
  inflating: forest-cover-type-prediction/  
  inflating: forest-cover-type-prediction/test.csv  
  inflating: forest-cover-type-prediction/  
  inflating: forest-cover-type-prediction/test3.csv  
  inflating: forest-cover-type-prediction/train.csv  
  inflating: forest-cover-type-prediction/  

Here, we load and visualize the dataset. We will split the dataset to 80% training and 20% development for the purpose of reporting the score on the development data. Also, for the purpose of demonstration, we will subsample the dataset to 5000 samples.

import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('forest-cover-type-prediction/')
df = df.drop('Id', 1)
df = df.sample(5000, random_state=100)
train_df, dev_df = train_test_split(df, random_state=100)

By visualizing the dataset, we can see that there are 54 feature columns and 1 label column called "Cover_Type".

Elevation Aspect Slope Horizontal_Distance_To_Hydrology Vertical_Distance_To_Hydrology Horizontal_Distance_To_Roadways Hillshade_9am Hillshade_Noon Hillshade_3pm Horizontal_Distance_To_Fire_Points ... Soil_Type32 Soil_Type33 Soil_Type34 Soil_Type35 Soil_Type36 Soil_Type37 Soil_Type38 Soil_Type39 Soil_Type40 Cover_Type
7449 2762 17 16 270 49 2639 206 206 134 268 ... 0 0 0 0 0 0 0 0 0 5
13086 2283 109 11 0 0 1138 240 227 116 1187 ... 0 0 0 0 0 0 0 0 0 4
14221 3220 82 14 247 66 3328 239 214 103 819 ... 1 0 0 0 0 0 0 0 0 1
768 3021 68 8 201 26 4134 228 225 130 2493 ... 0 0 0 0 0 0 0 0 0 1
6132 2446 76 21 469 105 726 241 196 75 1401 ... 0 0 0 0 0 0 0 0 0 6

5 rows × 55 columns

Train Model with One Line

Next, we train a model in AutoGluon with a single line of code. We will just need to specify the label column before calling .fit(). Here, the label column is Cover_Type. AutoGluno will inference the problem type automatically. In our example, it can correctly figure out that it is a “multiclass” classification problem and output the model with the best accuracy. Internally, it will also figure out the feature type automatically.

import autogluon
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='Cover_Type', path='ag_ec2021_demo').fit(train_df)
Warning: path already exists! This predictor may overwrite an existing predictor! path="ag_ec2021_demo"
Beginning AutoGluon training ...
AutoGluon will save models to "ag_ec2021_demo/"
AutoGluon Version:  0.2.1b20210511
Train Data Rows:    3750
Train Data Columns: 54
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	7 unique label values:  [5, 4, 1, 6, 3, 2, 7]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
NumExpr defaulting to 8 threads.
Train Data Class Count: 7
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    31462.81 MB
	Train Data (Original)  Memory Usage: 1.62 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Useless Original Features (Count: 4): ['Soil_Type7', 'Soil_Type8', 'Soil_Type15', 'Soil_Type25']
		These features carry no predictive signal and should be manually investigated.
		This is typically a feature which has the same value for all rows.
		These features do not need to be present at inference time.
	Types of features in original data (raw dtype, special dtypes):
		('int', []) : 50 | ['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('int', []) : 50 | ['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', ...]
	0.1s = Fit runtime
	50 features in original data used to generate 50 features in processed data.
	Train Data (Processed) Memory Usage: 1.5 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric argument of fit()
Automatically generating train/validation split with holdout_frac=0.13333333333333333, Train Rows: 3250, Val Rows: 500
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif ...
	0.72	 = Validation accuracy score
	0.02s	 = Training runtime
	0.11s	 = Validation runtime
Fitting model: KNeighborsDist ...
	0.744	 = Validation accuracy score
	0.01s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
	0.796	 = Validation accuracy score
	8.52s	 = Training runtime
	0.03s	 = Validation runtime
Fitting model: LightGBMXT ...
	0.83	 = Validation accuracy score
	1.93s	 = Training runtime
	0.03s	 = Validation runtime
Fitting model: LightGBM ...
	0.832	 = Validation accuracy score
	3.08s	 = Training runtime
	0.04s	 = Validation runtime
Fitting model: RandomForestGini ...
	0.822	 = Validation accuracy score
	0.85s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: RandomForestEntr ...
	0.824	 = Validation accuracy score
	1.02s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: CatBoost ...
	0.812	 = Validation accuracy score
	4.56s	 = Training runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesGini ...
	0.802	 = Validation accuracy score
	0.71s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: ExtraTreesEntr ...
	0.808	 = Validation accuracy score
	0.81s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: XGBoost ...
	0.816	 = Validation accuracy score
	6.59s	 = Training runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetMXNet ...
	0.8	 = Validation accuracy score
	9.82s	 = Training runtime
	0.12s	 = Validation runtime
Fitting model: LightGBMLarge ...
	0.834	 = Validation accuracy score
	6.4s	 = Training runtime
	0.03s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	0.858	 = Validation accuracy score
	0.35s	 = Training runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 49.1s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("ag_ec2021_demo/")

We can visualize the performance of each model with predictor.leaderboard(). Internally, AutoGluon trains a diverse set of different tabular models and computes a weighted ensemble to combine these models.

                  model  score_val  pred_time_val   fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0   WeightedEnsemble_L2      0.858       0.432053  28.982538                0.000448           0.346764            2       True         14
1         LightGBMLarge      0.834       0.032289   6.397996                0.032289           6.397996            1       True         13
2              LightGBM      0.832       0.040016   3.076502                0.040016           3.076502            1       True          5
3            LightGBMXT      0.830       0.028169   1.926692                0.028169           1.926692            1       True          4
4      RandomForestEntr      0.824       0.102347   1.017105                0.102347           1.017105            1       True          7
5      RandomForestGini      0.822       0.102403   0.848964                0.102403           0.848964            1       True          6
6               XGBoost      0.816       0.011192   6.591112                0.011192           6.591112            1       True         11
7              CatBoost      0.812       0.004262   4.561907                0.004262           4.561907            1       True          8
8        ExtraTreesEntr      0.808       0.102475   0.814237                0.102475           0.814237            1       True         10
9        ExtraTreesGini      0.802       0.102421   0.714676                0.102421           0.714676            1       True          9
10       NeuralNetMXNet      0.800       0.124815   9.818555                0.124815           9.818555            1       True         12
11      NeuralNetFastAI      0.796       0.029656   8.515627                0.029656           8.515627            1       True          3
12       KNeighborsDist      0.744       0.102354   0.012857                0.102354           0.012857            1       True          2
13       KNeighborsUnif      0.720       0.105474   0.017922                0.105474           0.017922            1       True          1
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 0.858 0.432053 28.982538 0.000448 0.346764 2 True 14
1 LightGBMLarge 0.834 0.032289 6.397996 0.032289 6.397996 1 True 13
2 LightGBM 0.832 0.040016 3.076502 0.040016 3.076502 1 True 5
3 LightGBMXT 0.830 0.028169 1.926692 0.028169 1.926692 1 True 4
4 RandomForestEntr 0.824 0.102347 1.017105 0.102347 1.017105 1 True 7
5 RandomForestGini 0.822 0.102403 0.848964 0.102403 0.848964 1 True 6
6 XGBoost 0.816 0.011192 6.591112 0.011192 6.591112 1 True 11
7 CatBoost 0.812 0.004262 4.561907 0.004262 4.561907 1 True 8
8 ExtraTreesEntr 0.808 0.102475 0.814237 0.102475 0.814237 1 True 10
9 ExtraTreesGini 0.802 0.102421 0.714676 0.102421 0.714676 1 True 9
10 NeuralNetMXNet 0.800 0.124815 9.818555 0.124815 9.818555 1 True 12
11 NeuralNetFastAI 0.796 0.029656 8.515627 0.029656 8.515627 1 True 3
12 KNeighborsDist 0.744 0.102354 0.012857 0.102354 0.012857 1 True 2
13 KNeighborsUnif 0.720 0.105474 0.017922 0.105474 0.017922 1 True 1

Evaluation and Prediction

We can also evaluate the model performance on the heldout predictor dataset by calling .evaluate().

Evaluation: accuracy on test data: 0.8168
Evaluations on test data:
    "accuracy": 0.8168,
    "balanced_accuracy": 0.8170602919410992,
    "mcc": 0.7866393746250545
{'accuracy': 0.8168,
 'balanced_accuracy': 0.8170602919410992,
 'mcc': 0.7866393746250545}

To get the prediction, you may just use predictor.predict().

predictions = predictor.predict(dev_df)
6084     3
927      5
10919    3
8867     2
14455    7
6618     5
9591     7
14307    1
1553     1
3        2
Name: Cover_Type, Length: 1250, dtype: int64

For classification problems, we can also use .predict_proba to get the probability.

probs = predictor.predict_proba(dev_df)
1 2 3 4 5 6 7
6084 0.000229 0.000843 0.744518 0.208950 0.000476 0.044587 0.000397
927 0.043397 0.347411 0.000929 0.001604 0.597819 0.006463 0.002378
10919 0.006373 0.060102 0.767284 0.000076 0.126009 0.038330 0.001827
8867 0.170293 0.748936 0.002065 0.000083 0.072658 0.002915 0.003051
14455 0.004558 0.004203 0.000125 0.000081 0.000263 0.000071 0.990699

Load the Predictor

Loading a AutoGluon model is straight-forward. We can directly call .load()

predictor_loaded = TabularPredictor.load('ag_ec2021_demo')
Evaluation: accuracy on test data: 0.8168
Evaluations on test data:
    "accuracy": 0.8168,
    "balanced_accuracy": 0.8170602919410992,
    "mcc": 0.7866393746250545
{'accuracy': 0.8168,
 'balanced_accuracy': 0.8170602919410992,
 'mcc': 0.7866393746250545}

Feature Importance

AutoGluon offers a built-in method for calculating the relative importance of each feature based on permutation-shuffling. In the following, we calculate the feature importance and print the top-10 important features. Here, importance means the importance score and the other values give you an understanding of the statistical significance of the calculated score.

importance = predictor.feature_importance(dev_df, subsample_size=500)
Computing feature importance via permutation shuffling for 54 features using 500 rows with 3 shuffle sets...
	104.88s	= Expected runtime (34.96s per shuffle set)
	17.35s	= Actual runtime (Completed 3 of 3 shuffle sets)
importance stddev p_value n p99_high p99_low
Elevation 0.475333 0.029143 0.000625 3 0.642328 0.308339
Horizontal_Distance_To_Roadways 0.085333 0.008327 0.001579 3 0.133046 0.037621
Horizontal_Distance_To_Fire_Points 0.066000 0.002000 0.000153 3 0.077460 0.054540
Horizontal_Distance_To_Hydrology 0.053333 0.013317 0.010078 3 0.129639 -0.022973
Hillshade_9am 0.023333 0.009238 0.024239 3 0.076266 -0.029599
Wilderness_Area4 0.018000 0.011136 0.053704 3 0.081808 -0.045808
Hillshade_Noon 0.016667 0.023861 0.174968 3 0.153391 -0.120058
Aspect 0.016000 0.014000 0.093162 3 0.096222 -0.064222
Vertical_Distance_To_Hydrology 0.014667 0.003055 0.007078 3 0.032172 -0.002839
Wilderness_Area1 0.012667 0.004163 0.017088 3 0.036523 -0.011190

From the results, we can see that Elevation is the most important feature. Horizontal_Distance_To_Roadways is the 2nd most important feature.

Achieve Better Performance

The default behavior of AutoGluon is to compute a weighted ensemble of a diverse set of models. Usually, you can achieve better performance via stack ensembling. To achieve better performance based on automated stack ensembling, you can specify presets="best_quality" when calling .fit() in AutoGluon. For more details, you can also checkout our provided script. The detailed architecture is described in [1] and we also provide the following figure so you can know the general architecture.


With .fit(train_df, presets="best_quality"), we are able to achieve 82/1692 in the competition. To reproduce our number, you may try the command mentioned in link.


Solar Radiation Prediction

In the second example, we will train model to predict the solar radiation. The orignal dataset is available in Kaggle Solar Radiation Prediction. The dataset contains such columns as: “wind direction”, “wind speed”, “humidity” and “temperature”. The response parameter that is to be predicted is: “Solar_radiation”. It contains measurements for the past 4 months and you have to predict the level of solar radiation. Let’s download and load the dataset.

!wget -O
--2021-05-16 08:44:25--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 523425 (511K) [application/zip]
Saving to: ‘’

SolarPrediction.csv 100%[===================>] 511.16K  --.-KB/s    in 0.007s  

2021-05-16 08:44:25 (76.8 MB/s) - ‘’ saved [523425/523425]
import pandas as pd
df = pd.read_csv('')
train_df, dev_df = train_test_split(df, random_state=100)
UNIXTime Data Time Radiation Temperature Pressure Humidity WindDirection(Degrees) Speed TimeSunRise TimeSunSet
2664 1474412104 9/20/2016 12:00:00 AM 12:55:04 1039.15 65 30.40 57 2.26 5.62 06:11:00 18:21:00
12230 1476543319 10/15/2016 12:00:00 AM 04:55:19 1.21 51 30.46 23 181.58 6.75 06:17:00 17:59:00
11706 1476704422 10/17/2016 12:00:00 AM 01:40:22 1.22 50 30.47 39 142.56 10.12 06:18:00 17:58:00
12924 1476330025 10/12/2016 12:00:00 AM 17:40:25 28.35 59 30.45 42 167.42 4.50 06:16:00 18:02:00
27507 1482367563 12/21/2016 12:00:00 AM 14:46:03 637.93 57 30.39 74 40.94 4.50 06:53:00 17:49:00
2516 1474457405 9/21/2016 12:00:00 AM 01:30:05 1.21 45 30.39 73 159.07 3.37 06:11:00 18:20:00
32227 1480723808 12/2/2016 12:00:00 AM 14:10:08 177.19 45 30.34 93 134.78 11.25 06:42:00 17:42:00
12705 1476396922 10/13/2016 12:00:00 AM 12:15:22 1008.08 65 30.46 46 71.24 5.62 06:17:00 18:01:00
14992 1475697322 10/5/2016 12:00:00 AM 09:55:22 292.44 55 30.47 101 18.70 7.87 06:14:00 18:08:00
23615 1478267417 11/4/2016 12:00:00 AM 03:50:17 1.18 44 30.42 38 176.34 7.87 06:25:00 17:47:00

Like in our previos example, we can directly train a predictor with a single .fit() call. The difference is that AutoGluon can automatically determine that it is a regression problem.

predictor = TabularPredictor(label='Radiation', eval_metric='r2', path='ag_ec2021_demo2').fit(train_df)
Warning: path already exists! This predictor may overwrite an existing predictor! path="ag_ec2021_demo2"
Beginning AutoGluon training ...
AutoGluon will save models to "ag_ec2021_demo2/"
AutoGluon Version:  0.2.1b20210511
Train Data Rows:    24514
Train Data Columns: 10
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == float and many unique label-values observed).
	Label info (max, min, mean, stddev): (1601.26, 1.11, 206.52072, 315.54334)
	If 'regression' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    27858.63 MB
	Train Data (Original)  Memory Usage: 7.88 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting DatetimeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', [])                      : 3 | ['Pressure', 'WindDirection(Degrees)', 'Speed']
		('int', [])                        : 3 | ['UNIXTime', 'Temperature', 'Humidity']
		('object', ['datetime_as_object']) : 4 | ['Data', 'Time', 'TimeSunRise', 'TimeSunSet']
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])                : 3 | ['Pressure', 'WindDirection(Degrees)', 'Speed']
		('int', [])                  : 3 | ['UNIXTime', 'Temperature', 'Humidity']
		('int', ['datetime_as_int']) : 4 | ['Data', 'Time', 'TimeSunRise', 'TimeSunSet']
	16.7s = Fit runtime
	10 features in original data used to generate 10 features in processed data.
	Train Data (Processed) Memory Usage: 1.96 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 16.74s ...
AutoGluon will gauge predictive performance using evaluation metric: 'r2'
	To change this, specify the eval_metric argument of fit()
Automatically generating train/validation split with holdout_frac=0.1, Train Rows: 22062, Val Rows: 2452
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif ...
	0.9501	 = Validation r2 score
	0.03s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: KNeighborsDist ...
	0.9531	 = Validation r2 score
	0.03s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: LightGBMXT ...
[1000]	train_set's l2: 5825	train_set's r2: 0.941343	valid_set's l2: 6881.24	valid_set's r2: 0.932405
[2000]	train_set's l2: 4818.35	train_set's r2: 0.951483	valid_set's l2: 6360.95	valid_set's r2: 0.937497
[3000]	train_set's l2: 4202.38	train_set's r2: 0.957684	valid_set's l2: 6212.24	valid_set's r2: 0.938993
[4000]	train_set's l2: 3751.34	train_set's r2: 0.962227	valid_set's l2: 6130.43	valid_set's r2: 0.939774
[5000]	train_set's l2: 3396.38	train_set's r2: 0.965805	valid_set's l2: 6110.98	valid_set's r2: 0.939962
[6000]	train_set's l2: 3117.1	train_set's r2: 0.968616	valid_set's l2: 6078.9	valid_set's r2: 0.940272
[7000]	train_set's l2: 2876.17	train_set's r2: 0.971039	valid_set's l2: 6073.82	valid_set's r2: 0.940339
[8000]	train_set's l2: 2666.91	train_set's r2: 0.973145	valid_set's l2: 6064.97	valid_set's r2: 0.940439
[9000]	train_set's l2: 2479.79	train_set's r2: 0.97503	valid_set's l2: 6082.82	valid_set's r2: 0.940253
	0.9405	 = Validation r2 score
	22.47s	 = Training runtime
	0.3s	 = Validation runtime
Fitting model: LightGBM ...
	0.9438	 = Validation r2 score
	2.37s	 = Training runtime
	0.02s	 = Validation runtime
Fitting model: RandomForestMSE ...
[1000]	train_set's l2: 2247.86	train_set's r2: 0.977368	valid_set's l2: 5751.29	valid_set's r2: 0.943489
	0.9436	 = Validation r2 score
	6.91s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: CatBoost ...
	0.942	 = Validation r2 score
	4.38s	 = Training runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesMSE ...
	0.9445	 = Validation r2 score
	2.1s	 = Training runtime
	0.1s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
No improvement since epoch 0: early stopping
	-0.3674	 = Validation r2 score
	12.72s	 = Training runtime
	0.04s	 = Validation runtime
Fitting model: XGBoost ...
	0.9447	 = Validation r2 score
	5.8s	 = Training runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetMXNet ...
	0.9348	 = Validation r2 score
	77.28s	 = Training runtime
	0.12s	 = Validation runtime
Fitting model: LightGBMLarge ...
	0.9445	 = Validation r2 score
	12.47s	 = Training runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	0.9547	 = Validation r2 score
	0.36s	 = Training runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 171.49s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("ag_ec2021_demo2/")

We can evaluate on the development set by calling .evaluate(). Here, we have specified the model to use R2 score so it will report the R2.

Evaluation: r2 on test data: 0.9543093262635433
Evaluations on test data:
    "r2": 0.9543093262635433,
    "root_mean_squared_error": -67.7654685984138,
    "mean_squared_error": -4592.158734362603,
    "mean_absolute_error": -24.437721801314463,
    "pearsonr": 0.9768914212616937,
    "median_absolute_error": -1.0411042070388794
{'r2': 0.9543093262635433,
 'root_mean_squared_error': -67.7654685984138,
 'mean_squared_error': -4592.158734362603,
 'mean_absolute_error': -24.437721801314463,
 'pearsonr': 0.9768914212616937,
 'median_absolute_error': -1.0411042070388794}

Similarly, we can also measure the feature importance.

importance = predictor.feature_importance(dev_df)
Computing feature importance via permutation shuffling for 10 features using 1000 rows with 3 shuffle sets...
	10.78s	= Expected runtime (3.59s per shuffle set)
	4.0s	= Actual runtime (Completed 3 of 3 shuffle sets)
importance stddev p_value n p99_high p99_low
UNIXTime 1.063341 0.042203 0.000262 3 1.305167 0.821515
Time 0.080480 0.004762 0.000583 3 0.107768 0.053192
Temperature 0.029517 0.001956 0.000730 3 0.040723 0.018311
Data 0.005320 0.001238 0.008781 3 0.012411 -0.001771
Humidity 0.004958 0.000768 0.003948 3 0.009356 0.000560
TimeSunRise 0.003704 0.001029 0.012388 3 0.009599 -0.002192
TimeSunSet 0.003685 0.001873 0.038180 3 0.014417 -0.007047
Pressure 0.000460 0.000689 0.183299 3 0.004408 -0.003487
WindDirection(Degrees) 0.000051 0.001093 0.471685 3 0.006313 -0.006211
Speed 0.000006 0.000357 0.489087 3 0.002052 -0.002039

More Information

You may check our website for more information and tutorials: We also support automatically train models with text, image, and multimodal tabular data.


  1. Erickson, Nick and Mueller, Jonas and Shirkov, Alexander and Zhang, Hang and Larroy, Pedro and Li, Mu and Smola, Alexander, AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data, 2020,