xclim: Software ecosystem for climate services

Author(s)

List authors, their current affiliations, up-to-date contact information, and ORCID if available. Add as many author lines as you need.

  • Author1 = {“name”: “Pascal Bourgault”, “affiliation”: “Ouranos Inc”, “email”: “bourgault.pascal@ouranos.ca”, “orcid”: “0000-0003-1192-0403”}

  • Author2 = {“name”: “Travis Logan”, “affiliation”: “Ouranos Inc”, “email”: “logan.travis@ouranos.ca”, “orcid”: “0000-0002-2212-9580”}

  • Author3 = {“name”: “David Huard”, “affiliation”: “Ouranos Inc”, “email”: “huard.david@ouranos.ca”, “orcid”: “0000-0003-0311-5498”}

  • Author4 = {“name”: “Trevor J. Smith”, “affiliation”: “Ouranos Inc”, “email”: “smith.trevorj@ouranos.ca”, “orcid”: “0000-0001-5393-8359”}

Table of Contents

Purpose

This notebook presents xclim, a python package allowing easy manipulation and analysis of N-D climate dataset. It goes in details on what tools xclim provides, and how it simplifies and standardizes climate analyses workflows. The notebook also demontrates how xclim functionality can be exposed as a Web Processing Service (WPS) through finch. This notebook was developped for Python 3.7+, but expects a special python environment and a running instance of finch. The configuration is optimized for running on binder.

Technical contributions

  • Development of a climate analysis library regrouping many different tools, with special attention to community standards (especially CF) and computationally efficient implementations.

    • Numerically efficient climate indices algorithms yielding standard-compliant outputs;

    • Model ensemble statistics and robustness metrics

    • Bias-adjustment algorithms

    • Other tools (not shown in this notebook) (subsetting, spatial analogs, calendar conversions)

  • Development of a web service exposing climate analysis processes through OGC’s WPS protocol.

Methodology

This notebook shows two examples of climate analysis workflows:

  1. Calculating climate indices on an ensemble of simulation data, then computing ensemble statistics;

  2. Bias-adjustment of model data.

  3. Calculating climate indices through a web service (finch).

The examples interleave code and markdown cells to describe what is being done. They also include some graphics created with holoviews, in order to give a clearer idea of what was achieved.

Results

This notebook demontrates how the use of xclim can help simplify and standardize climate analysis workflows.

Funding

This project benefitted from direct and indirect funding from multiple sources, which are cited in the “Acknowledgements” section below.

Keywords

keywords=[“climate”, “xarray”, “bias-adjustment”, “wps”]

Citation

Bourgault, Pascal et al. Ouranos, 2021. xclim: Software ecosystem for climate services. Accessed 2021/05/15 at https://github.com/Ouranosinc/xclim/blob/earthcube-nb/docs/notebooks/PB_01_xclim_Software_ecosystem_for_climate_services.ipynb

Suggested next steps

After reading through this notebook, we recommend looking at xclim’s documentation, especially the examples page, where other notebooks extend the examples shown here. Xclim offers other features that are not shown here, but might be of interest to readers : a spatial analogs submodule and internationalization utilities, among others. The list of indicators is also a good stop, and is regularly updated with new algorithms. It covers most indicators from ECAD to more complex and specialized one, like the Fire Weather Indexes.

The web service demontrated here is part of the bird-house ecosystem and project. It is used by different platforms, such as PAVICS and climatedata.ca.

Acknowledgements

Much of the development work of xclim was done to support climate services delivery at Ouranos and the Canadian Centre for Climate Services, and was funded by Environment and Climate Change Canada (ECCC). It has become a core component of the Power Analytics and Visualization for Climate Science (PAVICS) platform (Ouranos and CRIM 2021). PAVICS has been funded by CANARIE and the Québec Fonds Vert, and through other projects also benefits from the support of the Canadian Foundation for Innovation (CFI) and the Fonds de Recherche du Québec (FRQ). PAVICS is a joint effort of Ouranos and the Centre de Recherche en Informatique de Montréal (CRIM). xclim itself

Setup

Library import

import xclim
# Also import some submodules for direct access
import xclim.ensembles  # Ensemble creation and statistics
import xclim.sdba  # Bias-adjustment

# Data manipulation
import xarray as xr
import xclim.testing

# For interacting with finch WPS
import birdy

# Visualizations and display
from pprint import pprint
from IPython.display import display
import matplotlib as mpl
import matplotlib.pyplot as plt
from dask.diagnostics import ProgressBar

# For handling file paths
from pathlib import Path

# Some little figure tweaking
mpl.style.use('seaborn-deep')
mpl.style.use('seaborn-darkgrid')
%matplotlib inline
mpl.rcParams['figure.figsize'] = (12, 5)

Data import

In the first part, we use the “Ouranos standard ensemble of bias-adjusted climate scenarios version 1.0”, bias-adjusted and downscaled data from Ouranos. It includes 11 simulations from different climate models, all produced in the RCP8.5 experiment of the CMIP5 project. More information can be found on this page.

!du -hs data/*.nc
852K	data/ACCESS1-3_rcp85_singlept_1950-2100.nc
852K	data/BNU-ESM_rcp85_singlept_1950-2100.nc
852K	data/CanESM2_rcp85_singlept_1950-2100.nc
852K	data/CMCC-CMS_rcp85_singlept_1950-2100.nc
852K	data/GFDL-ESM2M_rcp85_singlept_1950-2100.nc
840K	data/HadGEM2-CC_rcp85_singlept_1950-2100.nc
852K	data/INM-CM4_rcp85_singlept_1950-2100.nc
848K	data/IPSL-CM5A-LR_rcp85_singlept_1950-2100.nc
848K	data/IPSL-CM5B-LR_rcp85_singlept_1950-2100.nc
852K	data/MPI-ESM-LR_rcp85_singlept_1950-2100.nc
852K	data/NorESM1-M_rcp85_singlept_1950-2100.nc

The second example, dealing with bias-adjustment, will use two sources. The model data comes from the CanESM2 model of CCCma. The reference time series are extracted from the Adjusted and Homogenized Canadian Climate Data (AHCCD), from Environment and Climate Change Canada. Both datasets are distributed under the Open Government Licence (Canada). We chose three weather stations representative of a reasonable range of climatic conditions and extracted time series over those locations from the model data.

Finally, the third section will use two datasets. A first one extracted from ECMWF’s ERA5 reanalysis includes time series extracted on a few locations for multiple variables. The second dataset is a subset of Natural Resources Canada (NRCAN)’s daily dataset derived from observational data. See https://cfs.nrcan.gc.ca/projects/3/4.

Data processing and analysis

Climate indices computation and ensemble statistics

A large part of climate data analytics is based on the computation and comparison of climate indices. Xclim’s main goal is to make it easier for researchers to process climate data and calculate those indices. It does so by providing a large library of climate indices, grouped in categories (CMIP’s “realms”) for convenience, but also by providing tools for all the small tweaks and pre- and post-processing steps that such workflows require.

xclim is built on xarray, which itself makes it easy to scale computations using dask. While many indices are conceptually simple, we aim to provide implementations that balance performance and code complexity. This allows for ease in understanding and extending the code base.

In the following steps we will use xclim to:

  • create an ensemble dataset,

  • compute a few climate indices,

  • calculate ensemble statistics on the results.

Creating the ensemble dataset

In xclim, an “ensemble dataset” is simply an xarray Dataset where multiple realizations of the same climate have been concatenated along a “realization” dimension. In our case here, as presented above, the realizations are in fact simulations from different models, but all have been bias-adjusted with the same reference. All ensemble-related functions are in the ensembles module.

Given a list of files, creating the ensemble dataset is straightforward:

# Get a list of files in the `earthcube_data`:
# The sort is not important here, it only ensures a consistent notebook output for automated checks
files = sorted(list(Path('data').glob('*1950-2100.nc')))

# Open files into the ensemble
ens = xclim.ensembles.create_ensemble(files)
ens
<xarray.Dataset>
Dimensions:      (realization: 11, time: 55152)
Coordinates:
  * time         (time) datetime64[ns] 1950-01-01 1950-01-02 ... 2100-12-31
    lat          float32 47.21
    lon          float32 -70.3
  * realization  (realization) int64 0 1 2 3 4 5 6 7 8 9 10
Data variables:
    pr           (realization, time) float32 dask.array<chunksize=(1, 55152), meta=np.ndarray>
    tasmax       (realization, time) float32 dask.array<chunksize=(1, 55152), meta=np.ndarray>
    tasmin       (realization, time) float32 dask.array<chunksize=(1, 55152), meta=np.ndarray>
Attributes:
    Conventions:     CF-1.5
    title:           ACCESS1-3 model output prepared for CMIP5 historical
    history:         CMIP5 compliant file produced from raw ACCESS model outp...
    institution:     CSIRO (Commonwealth Scientific and Industrial Research O...
    source:          ACCESS1-3 2011. Atmosphere: AGCM v1.0 (N96 grid-point, 1...
    redistribution:  Redistribution prohibited. For internal use only.

(Note: similarly to xarray.open_mfdataset, used internally, the dataset attributes are actually from the first file only.)

But how is this different from xarray.concat(files, concat_dim='realization)? While only slighlty more shorter to write, the main difference is the handling of incompatible calendars. Different models often use different calendars and this causes issues when merging/concatening their data in xarray. Xclim provides xclim.core.calendar.convert_calendar to handle this. For example, in this ensemble, we have a simulation of the HadGEM2 model which uses the “360_day” calendar:

ds360 = xr.open_dataset('data/HadGEM2-CC_rcp85_singlept_1950-2100.nc')
ds360.time[59].data
array(cftime.Datetime360Day(1950, 2, 30, 0, 0, 0, 0), dtype=object)
# Convert to the `noleap` calendar, see doc for details on arguments.
ds365 = xclim.core.calendar.convert_calendar(ds360, 'noleap', align_on='date')
ds365.time[59].data
array(cftime.DatetimeNoLeap(1950, 3, 2, 0, 0, 0, 0), dtype=object)

This conversion dropped the invalid days (for example: the 29th and 30th of February 1950) and convert the dtype of the time axis. This does alter the data, but it makes it possible to concatenate different model simulations into a single DataArray.

In create_ensemble, all members are converted to a common calendar. By default, it uses the normal numpy/pandas data type (a range-limited calendar equivalent to proleptic_gregorian), but for this model ensemble, it’s better to use noleap (all years have 365 days), as it is the calendar used in most models included here.

ens = xclim.ensembles.create_ensemble(files, calendar='noleap')
ens
<xarray.Dataset>
Dimensions:      (realization: 11, time: 55115)
Coordinates:
  * time         (time) object 1950-01-01 00:00:00 ... 2100-12-31 00:00:00
    lat          float32 47.21
    lon          float32 -70.3
  * realization  (realization) int64 0 1 2 3 4 5 6 7 8 9 10
Data variables:
    pr           (realization, time) float32 dask.array<chunksize=(1, 55115), meta=np.ndarray>
    tasmax       (realization, time) float32 dask.array<chunksize=(1, 55115), meta=np.ndarray>
    tasmin       (realization, time) float32 dask.array<chunksize=(1, 55115), meta=np.ndarray>
Attributes:
    Conventions:     CF-1.5
    title:           ACCESS1-3 model output prepared for CMIP5 historical
    history:         CMIP5 compliant file produced from raw ACCESS model outp...
    institution:     CSIRO (Commonwealth Scientific and Industrial Research O...
    source:          ACCESS1-3 2011. Atmosphere: AGCM v1.0 (N96 grid-point, 1...
    redistribution:  Redistribution prohibited. For internal use only.

Compute climate indicators

Climate indicators are stored in xclim.indicators.[realm] and act as functions, taking xarray.DataArray as input variables and returning one (or more) DataArray. Most indicators also take keyword arguments to control various parameters. Under the hood, indicators are python objects, storing information on what the computation does, its inputs and its outputs. For example, let’s look at the tropical_nights indicator:

TN = xclim.indicators.atmos.tropical_nights

# Print general information:
print('Title: ', TN.title)
print('General description: ', TN.abstract, '\n')

# Info on the inputs
print('Inputs description:')
pprint(TN.parameters)
print()

# Info on the output
print('Output description:')
pprint(TN.cf_attrs)
Title:  Tropical nights.
General description:  The number of days with minimum daily temperature above threshold. 

Inputs description:
{'ds': {'default': None,
        'description': 'A dataset with the variables given by name.',
        'kind': <InputKind.DATASET: 70>},
 'freq': {'default': 'YS',
          'description': 'Resampling frequency.',
          'kind': <InputKind.FREQ_STR: 3>},
 'tasmin': {'default': 'tasmin',
            'description': 'Minimum daily temperature.',
            'kind': <InputKind.VARIABLE: 0>,
            'units': '[temperature]'},
 'thresh': {'default': '20.0 degC',
            'description': 'Threshold temperature on which to base evaluation.',
            'kind': <InputKind.QUANTITY_STR: 2>,
            'units': '[temperature]'}}

Output description:
[{'cell_methods': 'time: minimum within days time: sum over days',
  'description': '{freq} number of Tropical Nights : defined as days with '
                 'minimum daily temperature above {thresh}',
  'long_name': 'Number of Tropical Nights (Tmin > {thresh})',
  'standard_name': 'number_of_days_with_air_temperature_above_threshold',
  'units': 'days',
  'var_name': 'tropical_nights'}]

We use the indicator by calling it like a function with the variable and parameters. We will look at the seasonal number of “tropical nights” with a threshold of 5°C. For frequency arguments, we use the same syntax as pandas. Here QS-DEC will resample to seasons of three (3) months, starting in December (so DJF, MMA, JJA, SON).

out = xclim.indicators.atmos.tropical_nights(tasmin=ens.tasmin, thresh='5 degC', freq='QS-DEC')
display(out.data)
out.isel(time=slice(0, 2)).load()
/home/phobos/.conda/envs/xclim-demo/lib/python3.9/site-packages/xclim/indicators/atmos/_temperature.py:83: UserWarning: Variable does not have a `cell_methods` attribute.
  cfchecks.check_valid(tasmin, "cell_methods", "*time: minimum within days*")
Array Chunk
Bytes 51.99 kiB 8 B
Shape (11, 605) (1, 1)
Count 127972 Tasks 6655 Chunks
Type float64 numpy.ndarray
605 11
<xarray.DataArray 'tropical_nights' (realization: 11, time: 2)>
array([[nan,  9.],
       [nan,  9.],
       [nan, 15.],
       [nan, 15.],
       [nan,  9.],
       [nan, nan],
       [nan, 12.],
       [nan,  2.],
       [nan, 20.],
       [nan, 15.],
       [nan, 17.]])
Coordinates:
  * time         (time) object 1949-12-01 00:00:00 1950-03-01 00:00:00
    lat          float32 47.21
    lon          float32 -70.3
  * realization  (realization) int64 0 1 2 3 4 5 6 7 8 9 10
Attributes:
    units:          days
    cell_methods:    time: minimum within days time: sum over days
    xclim_history:  [2021-06-07 17:30:27] tropical_nights: TROPICAL_NIGHTS(ta...
    standard_name:  number_of_days_with_air_temperature_above_threshold
    long_name:      Number of tropical nights (tmin > 5 degc)
    description:    Seasonal number of tropical nights : defined as days with...

Let’s talk about the many things that happened here:

  1. Input checks

    Before the real computation, xclim performs some checks on the inputs. These can differ between indicators, but the most common are CF convention and input frequency checks. The former looks at the standard_name and cell_methods attributes of the input variable and raises warnings when they do not fit with what it expects. Here, the standard_name (“air_temperature”) is correct, but the variable is missing the cell_methods attribute that would normally indicate that this is indeed the minimum temperature within each days.

    The second type of checks looks at the data itself. Most indicators expect data at a daily frequency, and if the input has a different frequency, the indicator checks will raise an error.

  2. Unit handling

    Another check on the inputs is performed to ensure the correct units were passed. Above, in the parameters dictionary, we saw that tropical_nights expects tasmin and thresh to both be “[temperature]” values. In the function, conversion are made to ensure compatible units. Here, tasmin is in Kelvin and thresh was given in Celsius. Under the hood, xclim uses pint to handle this, but with small modifications made so that CF-compliant unit strings are understood.

  3. Missing values handling

    After the computation, for indicators involving a resampling operation, periods where at least one value was missing (i.e. was a np.NaN) are masked out. Here the first period is masked for all realization since there was no data for the month of december 1949. This is the default behavior and it can be changed, as we will show further down.

  4. Metadata formatting

    The output’s attributes were formatted to include some information from the passed parameters. Notice how the description includes the given threshold and frequency (Seasonal number of tropical nights […]).

  5. Lazy computing

    This is not a feature inherent to xclim, but, by default, create_ensemble will open the ensemble dataset using the dask backend, one “chunk” for each file. This means that all further calculations are lazy, ie: python remembers the operations but doesn’t perform them as long as the result is not requested. This is why we needed to call .load() to see the first values of the output. This behaviour from xarray is extremely useful for computationally-intensive workflows as it allows dask to efficiently manage the memory and CPU ressources. Xclim aims to implement its indicators in a lazy and efficient way.

Let’s compute a few indicators and merge them into another ensemble dataset. This time, we will change the missing values handling to a more relaxed method, where periods are masked when at least 5% of the elements are missing. Moreover, we will set an option so that CF-checks warnings are ignored.

If you want to know more about the indicators we are computing, you can use help(indicator) or indicator? (in ipython/jupyter) to print their docstrings, which include most information shown above.

# Set options using a "context". Options set here are reset to their defaults as soon as we exit the "with" block
with xclim.set_options(cf_compliance='log', check_missing='pct', missing_options={'pct': {'tolerance': 0.05}}):

    # Intermediate computation : Approximate mean daily temp by taking the mean of tasmin and tasmax
    tas = xclim.indicators.atmos.tg(tasmin=ens.tasmin, tasmax=ens.tasmax)

    # First day below, first day of the year where tasmin goes below "thresh" for at least "window" days
    # Note that we resample yearly, but starting in July, so that the full winter is included
    out_fda = xclim.indicators.atmos.first_day_below(tasmin=ens.tasmin, thresh='-5 degC', window=3, freq='AS-JUL')

    # Number frost days : total number of days where tasmin < 0 degC
    out_nfd = xclim.indicators.atmos.frost_days(tasmin=ens.tasmin, freq='AS-JUL')

    # Solid precipitation accumulation : total thickness of solid precipitation (estimated by pr when tas < 0°C)
    out_spa = xclim.indicators.atmos.solid_precip_accumulation(pr=ens.pr, tas=tas, freq='AS-JUL')

    # Cold spell freq : Number of spells where tas is under "thresh" for at least "window" days.
    out_csf = xclim.indicators.atmos.cold_spell_frequency(tas=tas, thresh='-10 degC', window=3, freq='AS-JUL')

out = xr.merge([out_fda, out_nfd, out_spa, out_csf])
out
<xarray.Dataset>
Dimensions:               (realization: 11, time: 152)
Coordinates:
  * time                  (time) object 1949-07-01 00:00:00 ... 2100-07-01 00...
    lat                   float32 47.21
    lon                   float32 -70.3
  * realization           (realization) int64 0 1 2 3 4 5 6 7 8 9 10
Data variables:
    first_day_below       (realization, time) float64 dask.array<chunksize=(1, 1), meta=np.ndarray>
    frost_days            (realization, time) float64 dask.array<chunksize=(1, 1), meta=np.ndarray>
    solidprcptot          (realization, time) float64 dask.array<chunksize=(1, 1), meta=np.ndarray>
    cold_spell_frequency  (realization, time) float64 dask.array<chunksize=(1, 1), meta=np.ndarray>
Attributes:
    units:          
    is_dayofyear:   1
    calendar:       noleap
    cell_methods:   
    xclim_history:  [2021-06-07 17:30:33] first_day_below: FIRST_DAY_BELOW(ta...
    standard_name:  day_of_year
    long_name:      First day of year with temperature below -5 degc
    description:    First day of year with temperature below -5 degc for at l...

Ensemble statistics

All this is quite cool, but we still have a large amount of data to manage. In order to get an idea of the climatic evolution of our indicators and of the model uncertainty associated with it, we can compute ensemble percentiles. Here we extract the 10th, 50th (median) and 90th percentile of the distribution made up of the 11 members of the ensemble.

out_perc = xclim.ensembles.ensemble_percentiles(out, values=[10, 50, 90], split=False)
out_perc
<xarray.Dataset>
Dimensions:               (percentiles: 3, time: 152)
Coordinates:
  * time                  (time) object 1949-07-01 00:00:00 ... 2100-07-01 00...
    lat                   float32 47.21
    lon                   float32 -70.3
  * percentiles           (percentiles) int64 10 50 90
Data variables:
    first_day_below       (time, percentiles) float64 dask.array<chunksize=(1, 3), meta=np.ndarray>
    frost_days            (time, percentiles) float64 dask.array<chunksize=(1, 3), meta=np.ndarray>
    solidprcptot          (time, percentiles) float64 dask.array<chunksize=(1, 3), meta=np.ndarray>
    cold_spell_frequency  (time, percentiles) float64 dask.array<chunksize=(1, 3), meta=np.ndarray>
Attributes:
    units:          
    is_dayofyear:   1
    calendar:       noleap
    cell_methods:   
    xclim_history:  [2021-06-07 17:30:48] : Computation of the percentiles on...
    standard_name:  day_of_year
    long_name:      First day of year with temperature below -5 degc
    description:    First day of year with temperature below -5 degc for at l...

However, the first_day_below indicator is problematic. The data is in a “day of year” (doy) format, an integer starting at 1 on January 1st and going to 365 on December 31st (in our no leap calendar). There might be years where the first day below 0°C happened after December 31st, meaning that taking the percentiles directly will give incorrect results. The trick is simply to convert the “day of year” data to a format where the numerical order reflects the temporal order within each period.

The next graph show the incorrect version. Notice how for the years where some members have values in January (\(0 < doy \le 31\)), those “small” numerical values are classified into the 10th percentile, while they are in fact later dates, so should be considered so and classified as the 90th percentile.

out.first_day_below.plot(hue='realization', color='grey', alpha=0.5)
out_perc.first_day_below.plot(hue='percentiles')
plt.title('Incorrect percentiles computation');
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_25_0.png
# Convert "doys" to the number of days elapsed since the time coordinate (which is 1st of July)
fda_days_since = xclim.core.calendar.doy_to_days_since(out.first_day_below)

# Take percentiles now that the numerical ordering is correct
fda_days_since_perc = xclim.ensembles.ensemble_percentiles(fda_days_since, values=[10, 50, 90], split=False)

# Convert back to doys for meaningful values
fda_doy_perc = xclim.core.calendar.days_since_to_doy(fda_days_since_perc)

# Replace the bad data with the good one
out_perc['first_day_below'] = fda_doy_perc

Voilà! Lets enjoy our work!

with ProgressBar():
    out_perc.load()
[########################################] | 100% Completed | 20.9s
for name, var in out_perc.data_vars.items():
    plt.figure()
    var.plot(hue='percentiles')
    plt.title(name)
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_29_0.png ../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_29_1.png ../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_29_2.png ../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_29_3.png

Notice that the doy conversion trick did work for first_day_below, but plotting this kind of variable is not straightforward, and not the point of this notebook.

Bias-adjustment

Besides climate indicators, xclim comes with the sdba (statistical downscaling and bias-adjustment) module. The module tries to follow a “modular” approach instead of implementing published methods as black box functions. It comes with some pre- and post-processing function in sdba.processing and a few adjustment objects in sdba.adjustment. Adjustment algorithms all conform to the train - adjust scheme, formalized within Adjustment classes. Given a reference time series (ref), historical simulations (hist) and simulations to be adjusted (sim), any bias-adjustment method would be applied by first estimating the adjustment factors between the historical simulation and the observations series, and then applying these factors to sim, which could be a future simulation.

Most algorithms implemented also perform the adjustment separately on temporal groups. For example, for a Quantile Mapping method, one would compute quantiles and adjustment factors for each day of the year individually or within a moving window, so that seasonal variations do not interfere with the result.

Simple quantile mapping

Let’s start here with a simple Empirical Quantile Mapping adjustment for pr. Instead of adjusting a future period, we adjust the historical timeseries, so we can have a clear appreciation of the adjustment results. This adjustment method is based on [1].

Multiplicative and Additive modes : In most of the literature, bias-adjustment involving pr is done in a multiplicative mode, in opposition to the additive mode used for temperatures. In xclim.sdba, this is controlled with the kind keyword, which defaults to '+' (additive). This means that sim values will be multiplied to the adjustment factors in the case kind='*', and added (or subtracted) with '+'.

# We are opening the datasets with xclim.testing.open_dataset
# a convenience function for accessing xclim's test and example datasets, stored in a github repository.
dsref = xclim.testing.open_dataset('sdba/ahccd_1950-2013.nc')
dssim = xclim.testing.open_dataset('sdba/CanESM2_1950-2100.nc')

# Extract 30 years of the daily data
# The data is made of 3 time series, we take only one "location"
# `ref` is given in Celsius, we must convert to Kelvin to fit with hist and sim
ref = dsref.pr.sel(time=slice('1980', '2010'), location='Vancouver')
ref = xclim.core.units.convert_units_to(ref, 'mm/d')
hist = dssim.pr.sel(time=slice('1980', '2010'), location='Vancouver')
hist = xclim.core.units.convert_units_to(hist, 'mm/d')

# We want to group data on a day-of-year basis with a 31-day moving window
group_doy_31 = xclim.sdba.Grouper('time.dayofyear', window=31)

# Create the adjustment object
EQM = xclim.sdba.EmpiricalQuantileMapping(nquantiles=15, group=group_doy_31, kind='*')

# Train it
EQM.train(ref, hist)

# Adjust hist
# And we estimate correct adjustment factors by interpolating linearly between quantiles
scen = EQM.adjust(hist, interp='linear')
# Plot the season cycle
fig, ax = plt.subplots()
ref.groupby('time.dayofyear').mean().plot(ax=ax, label='Reference')
hist.groupby('time.dayofyear').mean().plot(ax=ax, label='Model - raw')
scen.groupby('time.dayofyear').mean().plot(ax=ax, label='Model - adjusted')
ax.legend();
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_32_0.png

To inspect the trained adjustment data, we can access it with EQM.ds:

EQM.ds
<xarray.Dataset>
Dimensions:    (dayofyear: 365, quantiles: 17)
Coordinates:
  * quantiles  (quantiles) float64 1e-06 0.03333 0.1 0.1667 ... 0.9 0.9667 1.0
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 ... 359 360 361 362 363 364 365
Data variables:
    af         (dayofyear, quantiles) float64 nan 0.0 0.0 ... 1.377 1.684 1.686
    hist_q     (dayofyear, quantiles) float64 0.0 0.001463 ... 15.84 30.81
Attributes:
    _xclim_adjustment:  {"py/object": "xclim.sdba.adjustment.EmpiricalQuantil...
    adj_params:         EmpiricalQuantileMapping(nquantiles=15, kind='*', gro...

More complex adjustment workflows can be created using the multiple tools exposed by xclim in sdba.processing and sdba.detrending. A more complete description of those is a bit too complex for the purposes of this notebook, but we invite interested users to look at the documentation.

Let’s make another simple example, using the DetrendedQuantileMapping method. Compared to the EmpiricalQuantileMapping method, this one will normalize and detrend the inputs before computing the adjustment factors. This ensures a better quantile-to-quantile comparison when the climate change signal is strong. In reality, the same steps could all be performed explicitly, making use of xclim’s modularity, as is shown in the doc examples. It is based on [2].

# Extract the data.
ref = xclim.core.units.convert_units_to(
    dsref.tasmax.sel(time=slice('1980', '2010'), location='Vancouver'),
    'K'
)
hist = xclim.core.units.convert_units_to(
    dssim.tasmax.sel(time=slice('1980', '2010'), location='Vancouver'),
    'K'
)
sim = xclim.core.units.convert_units_to(
    dssim.tasmax.sel(location='Vancouver'),
    'K'
)


# Adjustment
DQM = xclim.sdba.DetrendedQuantileMapping(nquantiles=15, group=group_doy_31, kind='+')
DQM.train(ref, hist)

scen = DQM.adjust(sim, interp='linear', detrend=1)
fig, ax = plt.subplots()
ref.groupby('time.dayofyear').mean().plot(ax=ax, label='Reference'),
hist.groupby('time.dayofyear').mean().plot(ax=ax, label='Simulation - raw')
scen.sel(time=slice('1981', '2010')).groupby('time.dayofyear').mean().plot(ax=ax, label='Simulation - adjusted')
ax.legend();
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_37_0.png

Just for fun, let’s compute the annual mean of the daily maximum temperature and plot the three time series:

with xclim.set_options(cf_compliance='log'):
    ref_tg = xclim.indicators.atmos.tx_mean(tasmax=ref)
    sim_tg = xclim.indicators.atmos.tx_mean(tasmax=sim)
    scen_tg = xclim.indicators.atmos.tx_mean(tasmax=scen)

fig, ax = plt.subplots()
ref_tg.plot(ax=ax, label='Reference')
sim_tg.plot(ax=ax, label='Model - Raw')
scen_tg.plot(ax=ax, label='Model - Adjusted')
ax.legend();
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_39_0.png

Finch : xclim as a service

This final part of the notebook demonstrates the use of finch to compute indicators over the web using the Web Processing Service (WPS) standard. When running this notebook on binder, a local server instance of finch is already running on port 5000. We will interact with it using birdy a lightweight WPS client built upon OWSLib.

What is a WPS?

Standardized by the OGC, WPS is a standard describing web requests and responses to and from geospatial processing services. The standard defines how process execution can be requested, and how the output from that process is handled. finch provides such a service, bundling xclim’s indicator and some other tools into individual processes.

It’s possible with WPS to either embed input data in the request, or pass a link to the data. In the latter case, the server will download data locally before performing its calculations. It’s also possible to use DAP links so the web server only streams the data it actually needs. In the following example, we use a subset of the ERA5 reanalysis data, available on xclim’s testdata GitHub repository.

from birdy import WPSClient

wps = WPSClient('http://localhost:5000')
dataurl = 'https://github.com/Ouranosinc/xclim-testdata/raw/main/ERA5/daily_surface_cancities_1990-1993.nc'
# As in xclim, processes and their inputs are documented 
# by parsing the indicator's attributes:
help(wps.growing_degree_days)
Help on method growing_degree_days in module birdy.client.base:

growing_degree_days(tas=None, check_missing='any', cf_compliance='warn', data_validation='raise', thresh='4.0 degC', freq='YS', missing_options=None, variable=None, output_formats=None) method of birdy.client.base.WPSClient instance
    The sum of degree-days over the threshold temperature.
    
    Parameters
    ----------
    tas : ComplexData:mimetype:`application/x-netcdf`, :mimetype:`application/x-ogc-dods`
        NetCDF Files or archive (tar/zip) containing netCDF files.
    thresh : string
        Threshold temperature on which to base evaluation.
    freq : {'YS', 'MS', 'QS-DEC', 'AS-JUL'}string
        Resampling frequency.
    check_missing : {'any', 'wmo', 'pct', 'at_least_n', 'skip', 'from_context'}string
        Method used to determine which aggregations should be considered missing.
    missing_options : ComplexData:mimetype:`application/json`
        JSON representation of dictionary of missing method parameters.
    cf_compliance : {'log', 'warn', 'raise'}string
        Whether to log, warn or raise when inputs have non-CF-compliant attributes.
    data_validation : {'log', 'warn', 'raise'}string
        Whether to log, warn or raise when inputs fail data validation checks.
    variable : string
        Name of the variable in the NetCDF file.
    
    Returns
    -------
    output_netcdf : ComplexData:mimetype:`application/x-netcdf`
        The indicator values computed on the original input grid.
    output_log : ComplexData:mimetype:`text/plain`
        Collected logs during process run.
    ref : ComplexData:mimetype:`application/metalink+xml; version=4.0`
        Metalink file storing all references to output files.
# Through birdy, the indicator call looks a lot like the one from xclim
# but passing an url instead of an xarray object.
# Also, the `variable` argument tells which variable from the dataset to use.
response = wps.growing_degree_days(
    dataurl,
    thresh='5 degC',
    freq='MS',
    variable='tas'
)

response.get()
growing_degree_daysResponse(
    output_netcdf='http://localhost:5000/outputs/c3603da3-c7d7-11eb-8e02-8cec4b2c9db7/out.nc',
    output_log='http://localhost:5000/outputs/c3603da3-c7d7-11eb-8e02-8cec4b2c9db7/log.txt',
    ref='http://localhost:5000/outputs/c3603da3-c7d7-11eb-8e02-8cec4b2c9db7/input.meta4'
)

The response we received is a list of URLs pointing to output files. Birdy makes it easy to open links to datasets directly in xarray:

out = response.get(asobj=True).output_netcdf
out.growing_degree_days.plot(hue='location');
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_45_0.png

Chain computations

As the goal of finch is to free the user of most, if not all, computation needs, it includes many other processes for all common climate analysis tasks. For example, here we will take maps of tasmax and tasmin over southern Québec, subset them with a square bounding box of lon/lat, compute tas as the mean of the other two and then finally, compute the growing season length with custom parameters.

All processes will be fed with the output of the previous one, so that no intermediate result is ever downloaded to the client. Only the last output is directly downloaded to file using python’s urllib.

tasmax_url = "https://github.com/Ouranosinc/xclim-testdata/raw/main/NRCANdaily/nrcan_canada_daily_tasmax_1990.nc"
tasmin_url = "https://github.com/Ouranosinc/xclim-testdata/raw/main/NRCANdaily/nrcan_canada_daily_tasmin_1990.nc"
pr_url = "https://github.com/Ouranosinc/xclim-testdata/raw/main/NRCANdaily/nrcan_canada_daily_pr_1990.nc"

# Subset the maps
resp_tx = wps.subset_bbox(tasmax_url, lon0=-72, lon1=-68, lat0=46, lat1=50)
resp_tn = wps.subset_bbox(tasmin_url, lon0=-72, lon1=-68, lat0=46, lat1=50)
resp_pr = wps.subset_bbox(pr_url, lon0=-72, lon1=-68, lat0=46, lat1=50)

# Compute tas
# We don't need to provide variable names because variables have the same names as the argument names
resp_tg = wps.tg(tasmin=resp_tn.get().output, tasmax=resp_tx.get().output)

# Compute growing season length, as defined by the period between
# Start/End when tas is over/under 10°C for 6 consecutive days
resp_lpr = wps.liquid_precip_ratio(
    pr=resp_pr.get().output,
    tas=resp_tg.get().output_netcdf,
    freq='YS'
)
# Plot the liquid precipitation ratio of 1990
out_lpr = resp_lpr.get(asobj=True).output_netcdf
out_lpr.liquid_precip_ratio.plot()
<matplotlib.collections.QuadMesh at 0x7f0194f86700>
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_48_1.png
# or we can plot it as a map!
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# Set a CRS transformation:
crs = ccrs.PlateCarree()

# Add some map elements
ax = plt.subplot(projection=crs)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.COASTLINE)
out_lpr.liquid_precip_ratio.plot(ax=ax, transform=crs, cmap="rainbow")

plt.show()
../../_images/PB_02_xclim_Software_ecosystem_for_climate_services_49_0.png

Output attributes are formatted the same as when xclim is used locally:

out_lpr.liquid_precip_ratio
<xarray.DataArray 'liquid_precip_ratio' (time: 1, lat: 48, lon: 48)>
array([[[0.65363 , 0.651616, ..., 0.63733 , 0.629445],
        [0.651229, 0.650147, ..., 0.629233, 0.636435],
        ...,
        [0.798548, 0.797585, ...,      nan,      nan],
        [0.812997, 0.806936, ...,      nan,      nan]]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 1990-01-01
  * lat      (lat) float32 49.96 49.88 49.79 49.71 ... 46.29 46.21 46.12 46.04
  * lon      (lon) float32 -71.96 -71.88 -71.79 -71.71 ... -68.21 -68.12 -68.04
Attributes:
    units:          
    cell_methods:   tas: tasmin: time: minimum within days tasmax: time: maxi...
    xclim_history:  pr: \ntas: \n[2021-06-07 17:32:05] liquid_precip_ratio: L...
    long_name:      Ratio of rainfall to total precipitation
    description:    Annual ratio of rainfall to total precipitation. rainfall...

Ensemble statistics

In addition to the indicators and the subsetting processes, finch wraps xclim’s ensemble statistics functionality in all-in-one processes. All indicator are provided in several ensemble-subset processes, where the inputs are subsetted before the indicator is computed and ensemble statistics taken. For example, ensemble_gridpoint_growing_season_length, which computes growing_season_length over a single gridpoint for all members of the specified ensemble. However, at least for the current version, ensemble datasets are hardcoded in the configuration of the server instance and one can’t send their own list of members.

In order not to overload Ouranos’ servers where these current hardcoded datasets are stored, no example of this is run by default here. But interested users are invited to uncomment the following lines to see for themselves.

The ensemble data used by this example is a bias-adjusted and downscaled product from the Pacific Climate Impacts Consortium (PCIC). It is described here.

help(wps.ensemble_grid_point_max_n_day_precipitation_amount)
Help on method ensemble_grid_point_max_n_day_precipitation_amount in module birdy.client.base:

ensemble_grid_point_max_n_day_precipitation_amount(lat, lon, rcp, check_missing='any', cf_compliance='warn', data_validation='raise', start_date=None, end_date=None, ensemble_percentiles='10,50,90', dataset_name=None, models='24MODELS', window=1, freq='YS', missing_options=None, output_format='netcdf', output_formats=None) method of birdy.client.base.WPSClient instance
    Calculate the n-day rolling sum of the original daily total precipitation series and determine the maximum value over each period.
    
    Parameters
    ----------
    lat : string
        Latitude coordinate. Accepts a comma separated list of floats for multiple grid cells.
    lon : string
        Longitude coordinate. Accepts a comma separated list of floats for multiple grid cells.
    start_date : string
        Initial date for temporal subsetting. Can be expressed as year (%Y), year-month (%Y-%m) or year-month-day(%Y-%m-%d). Defaults to first day in file.
    end_date : string
        Final date for temporal subsetting. Can be expressed as year (%Y), year-month (%Y-%m) or year-month-day(%Y-%m-%d). Defaults to last day in file.
    ensemble_percentiles : string
        Ensemble percentiles to calculate for input climate simulations. Accepts a comma separated list of integers.
    dataset_name : {'bccaqv2'}string
        Name of the dataset from which to get netcdf files for inputs.
    rcp : {'rcp26', 'rcp45', 'rcp85'}string
        Representative Concentration Pathway (RCP)
    models : {'24MODELS', 'PCIC12', 'BNU-ESM', 'CCSM4', 'CESM1-CAM5', 'CNRM-CM5', 'CSIRO-Mk3-6-0', 'CanESM2', 'FGOALS-g2', 'GFDL-CM3', ...}string
        When calculating the ensemble, include only these models. By default, all 24 models are used.
    window : integer
        Window size in days.
    freq : {'YS', 'MS', 'QS-DEC', 'AS-JUL'}string
        Resampling frequency.
    check_missing : {'any', 'wmo', 'pct', 'at_least_n', 'skip', 'from_context'}string
        Method used to determine which aggregations should be considered missing.
    missing_options : ComplexData:mimetype:`application/json`
        JSON representation of dictionary of missing method parameters.
    cf_compliance : {'log', 'warn', 'raise'}string
        Whether to log, warn or raise when inputs have non-CF-compliant attributes.
    data_validation : {'log', 'warn', 'raise'}string
        Whether to log, warn or raise when inputs fail data validation checks.
    output_format : {'netcdf', 'csv'}string
        Choose in which format you want to recieve the result
    
    Returns
    -------
    output : ComplexData:mimetype:`application/x-netcdf`, :mimetype:`application/zip`
        The format depends on the 'output_format' input parameter.
    output_log : ComplexData:mimetype:`text/plain`
        Collected logs during process run.
## Get maximal 5 days precipitation amount for Niagara Falls between 2040 and 2070
## for 12 rcp45 datasets bias-adjusted and downscaled with BCCAQv2 by PCIC
## Careful : this call will take a long time to succeed!

#resp_ens = wps.ensemble_grid_point_max_n_day_precipitation_amount(
#    lat=43.08,
#    lon=-79.07,
#    start_date='2040-01-01',
#    end_date='2070-12-31',
#    ensemble_percentiles=[10, 50, 90],
#    rcp='rcp45',
#    models='PCIC12',
#    window=5,
#    freq='YS',
#    data_validation='warn',
#)
#resp_ens.get(asobj=True).output

This concludes our xclim and finch tour! We hope these tools can be useful for your workflows. The development of both is always on-going, visit us at our GitHub pages for any questions or if you want to contribute!

xclim

finch

References

Bias-adjustment

[1] Dequé, M. (2007). Frequency of precipitation and temperature extremes over France in an anthropogenic scenario: Model results and statistical correction according to observed values. Global and Planetary Change, 57(1–2), 16–26. https://doi.org/10.1016/j.gloplacha.2006.11.030

[2] Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1

Software used

xarray 0.18.2 : Stephan Hoyer, Joe Hamman, Maximilian Roos, keewis, Deepak Cherian, Clark Fitzgerald, … Benoit Bovy. (2021, May 19). pydata/xarray: v0.18.2 (Version v0.18.2). Zenodo. http://doi.org/10.5281/zenodo.4774304

numpy : Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 0.1038/s41586-020-2649-2

dask : Dask Development Team (2016). Dask: Library for dynamic task scheduling URL https://dask.org

cftime 1.4 : Jeff Whitaker, Constantine Khrulev, Spencer Clark, Joe Hamman, Bill Little, Ryan May, … Stefan. (2021, January 31). Unidata/cftime: version 1.4.0 release (Version v1.4.0rel). Zenodo. http://doi.org/10.5281/zenodo.4484141