cf-xarray: Scale your analysis across datasets with less data wrangling and more metadata handling¶

Author(s)¶

Author1 = {“name”: “Mattia Almansi”, “affiliation”: “National Oceanography Centre”, “email”: “mattia.almansi@noc.ac.uk”, “orcid”: “0000-0001-6849-3647”}
Author2 = {“name”: “Deepak Cherian”, “affiliation”: “National Center for Atmospheric Research”, “email”: “deepak@cherian.net”, “orcid”: “0000-0002-6861-8734”}
Author3 = {“name”: “Pascal Bourgault”, “affiliation”: “Ouranos Inc.”, “email”: “bourgault.pascal@ouranos.ca”, “orcid”: “0000-0003-1192-0403”}

This notebook demonstrates how the cf-xarray Python package (Cherian et al, 2021) helps climate data scientists to process several CF-compliant datasets from a variety of sources. Under the hood, CF-xarray decodes and makes use of the widely adopted Climate and Forecast (CF) conventions. Therefore, workflows integrating cf-xarray do not need knowledge of arbitrary dataset-specific metadata such as variable names.

Xarray (Hoyer et al, 2021) is a Python package that enables easy and convenient labelled data analytics by allowing users to leverage metadata such as dimension names and coordinate labels. Xarray provides two core data structures:

DataArray: a container wrapping a single multidimensional array with metadata.
Dataset: a dict-like container of multiple DataArrays.

cf-xarray uses Xarray’s plugin interface, or “accessor”, to provide extensive functionality on both Datasets and DataArrays under the .cf namespace.

For example, the zonal average of an Xarray Dataset ds is seamlessly calculated as ds.cf.mean("longitude") on any CF-compliant dataset, regardless of the actual name of the “longitude” variable (e.g., "lon", "lon_rho", "long", …).

Technical contributions¶

Development of cf-xarray, an extension library that

adds awareness of the Climate and Forecast (CF) conventions to core Xarray functionality.
Provides utility functions with minimal dependencies, allowing easy integration of CF-aware functionality in other packages such as xesmf (for regridding).

Methodology¶

This notebook can be executed in a pre-configured interactive environment at: https://binder.pangeo.io/v2/gh/malmans2/cf-xarray-earthcube/main?filepath=DC_01_cf-xarray.ipynb

Results¶

This notebook contains use case examples demonstrating the following functionalities of cf-xarray:

Seamless analysis of various CF-compliant datasets.
Standardization of datasets to comply with CF conventions.
Inference of grid-cell coordinates and bounds using CF conventions.
Integration with other libraries (xESMF).

Funding¶

N/A

Keywords¶

Include up to 5 keywords, using the template below.

keywords=[“cf-conventions”, “xarray”, “netcdf”]

Citation¶

Almansi, M. and Cherian, D. and Bourgault, P. (2021). cf-xarray: Scale your analysis across datasets with less data wrangling and more metadata handling. 2021 EarthCube Annual Meeting. Accessed 14/5/2021 at https://github.com/malmans2/cf-xarray-earthcube

Work In Progress - improvements¶

N/A

Suggested next steps¶

It is common for datasets to not be perfectly CF-compliant. Here we work around these deficiencies using assign_coordinates_and_cell_measures. cf_xarray is considering adding heuristics to guess such metadata attributes, possibly using other metadata conventions such as SGRID.

Acknowledgements¶

We acknowledge contributions from all cf-xarray contributors: https://github.com/xarray-contrib/cf-xarray/graphs/contributors. We also acknowledge MetPy for providing inspiration and various criteria for identifying CF variables. Discussion with Jon Thielen was instrumental in development of cf-xarray. We also acknowledge the Pangeo project for collating and providing an immense amount of datasets on the cloud that motivated this work, as well as provoking, enabling, and fostering discussions that led to the development of this project.

EarthCube 2021 Call for Notebooks

cf-xarray: Scale your analysis across datasets with less data wrangling and more metadata handling¶

Author(s)¶

Table of Contents

Purpose¶

Technical contributions¶

Methodology¶

Results¶

Funding¶

Keywords¶

Citation¶

Work In Progress - improvements¶

Suggested next steps¶

Acknowledgements¶

Setup¶

Library import¶

Parameter definitions¶

Data import¶

MOM6¶

CMIP6 Ocean¶

CMIP6 Ice¶

NCEP¶

Data processing and analysis¶

Overview of cf-xarray¶

`.cf` is an entrypoint for cf-xarray functionality¶

Use CF metadata in standard xarray methods¶

Dictionaries mapping CF keys to variable names¶

Dealing with incomplete metadata¶

Indexing using CF keys¶

Automagic plotting¶

CF keys expansion¶

Use case examples¶

Seamlessly extract statistics from CF-compliant datasets¶

Access variables through the `.cf` interface¶

Standardize datasets using cf_xarray¶

Regridding with xESMF¶

Conclusion¶

References¶

EarthCube 2021 Call for Notebooks

cf-xarray: Scale your analysis across datasets with less data wrangling and more metadata handling¶

Author(s)¶

Table of Contents

Purpose¶

Technical contributions¶

Methodology¶

Results¶

Funding¶

Keywords¶

Citation¶

Work In Progress - improvements¶

Suggested next steps¶

Acknowledgements¶

Setup¶

Library import¶

Parameter definitions¶

Data import¶

MOM6¶

CMIP6 Ocean¶

CMIP6 Ice¶

NCEP¶

Data processing and analysis¶

Overview of cf-xarray¶

.cf is an entrypoint for cf-xarray functionality¶

Use CF metadata in standard xarray methods¶

Dictionaries mapping CF keys to variable names¶

Dealing with incomplete metadata¶

Indexing using CF keys¶

Automagic plotting¶

CF keys expansion¶

Use case examples¶

Seamlessly extract statistics from CF-compliant datasets¶

Access variables through the .cf interface¶

Standardize datasets using cf_xarray¶

Regridding with xESMF¶

Conclusion¶

References¶

`.cf` is an entrypoint for cf-xarray functionality¶

Access variables through the `.cf` interface¶