{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6508e09a",
   "metadata": {},
   "source": [
    "# cf-xarray: Scale your analysis across datasets with less data wrangling and more metadata handling\n",
    "\n",
    "<img src=\"https://github.com/xarray-contrib/cf-xarray/blob/main/doc/_static/full-logo.png?raw=true\" width=\"40%\" align=\"center\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18a51619",
   "metadata": {},
   "source": [
    "## Author(s)\n",
    "\n",
    "- Author1 = {\"name\": \"Mattia Almansi\", \"affiliation\": \"National Oceanography\n",
    "  Centre\", \"email\": \"mattia.almansi@noc.ac.uk\", \"orcid\": \"0000-0001-6849-3647\"}\n",
    "- Author2 = {\"name\": \"Deepak Cherian\", \"affiliation\": \"National Center for\n",
    "  Atmospheric Research\", \"email\": \"deepak@cherian.net\", \"orcid\":\n",
    "  \"0000-0002-6861-8734\"}\n",
    "- Author3 = {\"name\": \"Pascal Bourgault\", \"affiliation\": \"Ouranos Inc.\", \"email\":\n",
    "  \"bourgault.pascal@ouranos.ca\", \"orcid\": \"0000-0003-1192-0403\"}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7846836",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#cf-xarray:-Scale-your-analysis-across-datasets-with-less-data-wrangling-and-more-metadata-handling\" data-toc-modified-id=\"cf-xarray:-Scale-your-analysis-across-datasets-with-less-data-wrangling-and-more-metadata-handling-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>cf-xarray: Scale your analysis across datasets with less data wrangling and more metadata handling</a></span><ul class=\"toc-item\"><li><span><a href=\"#Author(s)\" data-toc-modified-id=\"Author(s)-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Author(s)</a></span></li><li><span><a href=\"#Purpose\" data-toc-modified-id=\"Purpose-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Purpose</a></span></li><li><span><a href=\"#Technical-contributions\" data-toc-modified-id=\"Technical-contributions-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Technical contributions</a></span></li><li><span><a href=\"#Methodology\" data-toc-modified-id=\"Methodology-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Methodology</a></span></li><li><span><a href=\"#Results\" data-toc-modified-id=\"Results-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>Results</a></span></li><li><span><a href=\"#Funding\" data-toc-modified-id=\"Funding-1.6\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>Funding</a></span></li><li><span><a href=\"#Keywords\" data-toc-modified-id=\"Keywords-1.7\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span>Keywords</a></span></li><li><span><a href=\"#Citation\" data-toc-modified-id=\"Citation-1.8\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>Citation</a></span></li><li><span><a href=\"#Work-In-Progress---improvements\" data-toc-modified-id=\"Work-In-Progress---improvements-1.9\"><span class=\"toc-item-num\">1.9&nbsp;&nbsp;</span>Work In Progress - improvements</a></span></li><li><span><a href=\"#Suggested-next-steps\" data-toc-modified-id=\"Suggested-next-steps-1.10\"><span class=\"toc-item-num\">1.10&nbsp;&nbsp;</span>Suggested next steps</a></span></li><li><span><a href=\"#Acknowledgements\" data-toc-modified-id=\"Acknowledgements-1.11\"><span class=\"toc-item-num\">1.11&nbsp;&nbsp;</span>Acknowledgements</a></span></li></ul></li><li><span><a href=\"#Setup\" data-toc-modified-id=\"Setup-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Setup</a></span><ul class=\"toc-item\"><li><span><a href=\"#Library-import\" data-toc-modified-id=\"Library-import-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Library import</a></span></li></ul></li><li><span><a href=\"#Parameter-definitions\" data-toc-modified-id=\"Parameter-definitions-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Parameter definitions</a></span></li><li><span><a href=\"#Data-import\" data-toc-modified-id=\"Data-import-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Data import</a></span><ul class=\"toc-item\"><li><span><a href=\"#MOM6\" data-toc-modified-id=\"MOM6-4.1\"><span class=\"toc-item-num\">4.1&nbsp;&nbsp;</span>MOM6</a></span></li><li><span><a href=\"#CMIP6-Ocean\" data-toc-modified-id=\"CMIP6-Ocean-4.2\"><span class=\"toc-item-num\">4.2&nbsp;&nbsp;</span>CMIP6 Ocean</a></span></li><li><span><a href=\"#CMIP6-Ice\" data-toc-modified-id=\"CMIP6-Ice-4.3\"><span class=\"toc-item-num\">4.3&nbsp;&nbsp;</span>CMIP6 Ice</a></span></li><li><span><a href=\"#NCEP\" data-toc-modified-id=\"NCEP-4.4\"><span class=\"toc-item-num\">4.4&nbsp;&nbsp;</span>NCEP</a></span></li></ul></li><li><span><a href=\"#Data-processing-and-analysis\" data-toc-modified-id=\"Data-processing-and-analysis-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Data processing and analysis</a></span><ul class=\"toc-item\"><li><span><a href=\"#Overview-of-cf-xarray\" data-toc-modified-id=\"Overview-of-cf-xarray-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Overview of cf-xarray</a></span><ul class=\"toc-item\"><li><span><a href=\"#.cf-is-an-entrypoint-for-cf-xarray-functionality\" data-toc-modified-id=\".cf-is-an-entrypoint-for-cf-xarray-functionality-5.1.1\"><span class=\"toc-item-num\">5.1.1&nbsp;&nbsp;</span><code>.cf</code> is an entrypoint for cf-xarray functionality</a></span></li><li><span><a href=\"#Use-CF-metadata-in-standard-xarray-methods\" data-toc-modified-id=\"Use-CF-metadata-in-standard-xarray-methods-5.1.2\"><span class=\"toc-item-num\">5.1.2&nbsp;&nbsp;</span>Use CF metadata in standard xarray methods</a></span></li><li><span><a href=\"#Dictionaries-mapping-CF-keys-to-variable-names\" data-toc-modified-id=\"Dictionaries-mapping-CF-keys-to-variable-names-5.1.3\"><span class=\"toc-item-num\">5.1.3&nbsp;&nbsp;</span>Dictionaries mapping CF keys to variable names</a></span></li><li><span><a href=\"#Dealing-with-incomplete-metadata\" data-toc-modified-id=\"Dealing-with-incomplete-metadata-5.1.4\"><span class=\"toc-item-num\">5.1.4&nbsp;&nbsp;</span>Dealing with incomplete metadata</a></span></li><li><span><a href=\"#Indexing-using-CF-keys\" data-toc-modified-id=\"Indexing-using-CF-keys-5.1.5\"><span class=\"toc-item-num\">5.1.5&nbsp;&nbsp;</span>Indexing using CF keys</a></span></li><li><span><a href=\"#Automagic-plotting\" data-toc-modified-id=\"Automagic-plotting-5.1.6\"><span class=\"toc-item-num\">5.1.6&nbsp;&nbsp;</span>Automagic plotting</a></span></li><li><span><a href=\"#CF-keys-expansion\" data-toc-modified-id=\"CF-keys-expansion-5.1.7\"><span class=\"toc-item-num\">5.1.7&nbsp;&nbsp;</span>CF keys expansion</a></span></li></ul></li><li><span><a href=\"#Use-case-examples\" data-toc-modified-id=\"Use-case-examples-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>Use case examples</a></span><ul class=\"toc-item\"><li><span><a href=\"#Seamlessly-extract-statistics-from-CF-compliant-datasets\" data-toc-modified-id=\"Seamlessly-extract-statistics-from-CF-compliant-datasets-5.2.1\"><span class=\"toc-item-num\">5.2.1&nbsp;&nbsp;</span>Seamlessly extract statistics from CF-compliant datasets</a></span></li><li><span><a href=\"#Standardize-datasets\" data-toc-modified-id=\"Standardize-datasets-5.2.2\"><span class=\"toc-item-num\">5.2.2&nbsp;&nbsp;</span>Standardize datasets</a></span></li><li><span><a href=\"#Regridding-with-xESMF\" data-toc-modified-id=\"Regridding-with-xESMF-5.2.3\"><span class=\"toc-item-num\">5.2.3&nbsp;&nbsp;</span>Regridding with xESMF</a></span></li></ul></li></ul></li></ul></div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0885da1",
   "metadata": {},
   "source": [
    "## Purpose\n",
    "\n",
    "This notebook demonstrates how the **cf-xarray** Python package (Cherian et\n",
    "al, 2021) helps climate data scientists to process several CF-compliant datasets\n",
    "from a variety of sources. Under the hood, CF-xarray decodes and makes use of\n",
    "the widely adopted [Climate and Forecast (CF) conventions](cfconventions.org/).\n",
    "Therefore, workflows integrating cf-xarray do not need knowledge of arbitrary\n",
    "dataset-specific metadata such as variable names.\n",
    "\n",
    "Xarray (Hoyer et al, 2021) is a Python package that enables easy and convenient\n",
    "labelled data analytics by allowing users to leverage metadata such as dimension\n",
    "names and coordinate labels. Xarray provides two core data structures:\n",
    "\n",
    "- DataArray: a container wrapping a single multidimensional array with metadata.\n",
    "- Dataset: a dict-like container of multiple DataArrays.\n",
    "\n",
    "cf-xarray uses Xarray's plugin interface, or \"accessor\", to provide extensive\n",
    "functionality on both Datasets and DataArrays under the `.cf` namespace.\n",
    "\n",
    "For example, the zonal average of an Xarray Dataset `ds` is seamlessly\n",
    "calculated as `ds.cf.mean(\"longitude\")` on any CF-compliant dataset, regardless\n",
    "of the actual name of the \"longitude\" variable (e.g., `\"lon\"`, `\"lon_rho\"`,\n",
    "`\"long\"`, ...).\n",
    "\n",
    "## Technical contributions\n",
    "\n",
    "Development of cf-xarray, an extension library that\n",
    "\n",
    "1. adds awareness of the Climate and Forecast (CF) conventions to core Xarray\n",
    "   functionality.\n",
    "1. Provides utility functions with minimal dependencies, allowing easy\n",
    "   integration of CF-aware functionality in other packages such as xesmf (for\n",
    "   regridding).\n",
    "\n",
    "## Methodology\n",
    "\n",
    "This notebook can be executed in a pre-configured interactive environment at:\n",
    "https://binder.pangeo.io/v2/gh/malmans2/cf-xarray-earthcube/main?filepath=DC_01_cf-xarray.ipynb\n",
    "\n",
    "## Results\n",
    "\n",
    "This notebook contains use case examples demonstrating the following\n",
    "functionalities of cf-xarray:\n",
    "\n",
    "1. Seamless analysis of various CF-compliant datasets.\n",
    "2. Standardization of datasets to comply with CF conventions.\n",
    "3. Inference of grid-cell coordinates and bounds using CF conventions.\n",
    "4. Integration with other libraries (xESMF).\n",
    "\n",
    "## Funding\n",
    "\n",
    "N/A\n",
    "\n",
    "## Keywords\n",
    "\n",
    "Include up to 5 keywords, using the template below.\n",
    "\n",
    "keywords=[\"cf-conventions\", \"xarray\", \"netcdf\"]\n",
    "\n",
    "## Citation\n",
    "\n",
    "Almansi, M. and Cherian, D. and Bourgault, P. (2021). cf-xarray: Scale your\n",
    "analysis across datasets with less data wrangling and more metadata handling.\n",
    "_2021 EarthCube Annual Meeting_. Accessed 14/5/2021 at\n",
    "https://github.com/malmans2/cf-xarray-earthcube\n",
    "\n",
    "## Work In Progress - improvements\n",
    "\n",
    "N/A\n",
    "\n",
    "## Suggested next steps\n",
    "\n",
    "It is common for datasets to not be perfectly CF-compliant. Here we work around\n",
    "these deficiencies using `assign_coordinates_and_cell_measures`. `cf_xarray` is\n",
    "considering adding heuristics to guess such metadata attributes, possibly using\n",
    "other metadata conventions such as SGRID.\n",
    "\n",
    "## Acknowledgements\n",
    "\n",
    "We acknowledge contributions from all cf-xarray contributors:\n",
    "https://github.com/xarray-contrib/cf-xarray/graphs/contributors. We also\n",
    "acknowledge MetPy for providing inspiration and various criteria for identifying\n",
    "CF variables. Discussion with Jon Thielen was instrumental in development of\n",
    "cf-xarray. We also acknowledge the Pangeo project for collating and providing an\n",
    "immense amount of datasets on the cloud that motivated this work, as well as\n",
    "provoking, enabling, and fostering discussions that led to the development of\n",
    "this project.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "280bb0e9",
   "metadata": {},
   "source": [
    "# Setup\n",
    "\n",
    "## Library import\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8c201b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The package demonstrated here\n",
    "import cf_xarray as cfxr\n",
    "\n",
    "# For parallelization\n",
    "import dask\n",
    "\n",
    "# For loading shared data\n",
    "import intake\n",
    "\n",
    "# Visualizations\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# For basic data manipulation\n",
    "import numpy as np\n",
    "import xarray as xr\n",
    "\n",
    "# For regridding\n",
    "import xesmf as xe\n",
    "\n",
    "# silence a minor warning\n",
    "dask.config.set(**{\"array.slicing.split_large_chunks\": False})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2616704",
   "metadata": {},
   "source": [
    "# Parameter definitions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a12a533",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Paths and urls pointing to data\n",
    "MOM6_GRID_PATH = \"./data/ocean_grid_sym_OM4_05.nc\"\n",
    "MOM6_DATA_URL = \"http://35.188.34.63:8080/thredds/dodsC/OM4p5/ocean_monthly_z.200301-200712.nc4\"\n",
    "CMIP6_OCE_CATALOG = \"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\"\n",
    "CMIP6_OCE_EXPERIMENT = dict(\n",
    "    table_id=\"Omon\",\n",
    "    grid_label=\"gn\",\n",
    "    source_id=\"ACCESS-CM2\",\n",
    "    # source_id=\"GFDL-CM4\",\n",
    "    member_id=\"r1i1p1f1\",\n",
    "    experiment_id=[\"historical\"],\n",
    "    variable_id=[\"thetao\", \"volcello\"],\n",
    ")\n",
    "CMIP6_ICE_PATH = \"data/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc\"\n",
    "NCEP_PATH = \"data/air_temperature.nc\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "811b80fb",
   "metadata": {},
   "source": [
    "# Data import\n",
    "\n",
    "This notebook uses a variety of publicly available data:\n",
    "\n",
    "- `ds_mom6`: Data from the Modular Ocean Model - v6 (Adcroft et al, 2019)\n",
    "- `ds_cmip6_oce`: Ocean Data from the Climate Model Intercomparison Project -\n",
    "  Phase 6\n",
    "- `ds_cmip6_ice_ice`: Ice Data from the Climate Model Intercomparison Project -\n",
    "  Phase 6\n",
    "- `ds_ncep`: Data from the National Centers for Atmospheric Prediction\n",
    "  Reanalysis (Kalnay et al, 1996)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60d94c98",
   "metadata": {},
   "outputs": [],
   "source": [
    "def assign_coordinates_and_cell_measures(ds):\n",
    "\n",
    "    \"\"\"\n",
    "    Functions to add missing CF metadata (coordinates and cell measures).\n",
    "    Fully CF-compliant datasets do not need this pre-processing.\n",
    "    Functions to automatically assign missing coordinates\n",
    "    and measures metadata will be implemented in cf_xarray.\n",
    "    See https://github.com/xarray-contrib/cf-xarray/issues/201\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    ds: xarray.Dataset\n",
    "        Dataset to modify\n",
    "    \"\"\"\n",
    "\n",
    "    for varname, variable in ds.data_vars.items():\n",
    "\n",
    "        # Add coordinates attribute when the dimensions\n",
    "        # of a coordinate variable are a subset of those of\n",
    "        # a variable\n",
    "        coordinates = []\n",
    "        for coord in sum(ds.cf.coordinates.values(), []):\n",
    "            if set(ds[coord].dims) <= set(variable.dims):\n",
    "                coordinates.append(coord)\n",
    "        # sets an attribute like \"geolon geolat\"\n",
    "        if coordinates:\n",
    "            variable.attrs[\"coordinates\"] = \" \".join(coordinates)\n",
    "        else:\n",
    "            variable.attrs.pop(\"coordinates\", None)\n",
    "\n",
    "        # Add cell_measures attribute when appropriate measures are available\n",
    "        cell_measures = {}\n",
    "        possible_measures = {\n",
    "            \"cell_thickness\",\n",
    "            \"cell_area\",\n",
    "            \"ocean_volume\",\n",
    "        } & set(ds.cf.standard_names)\n",
    "        for stdname in possible_measures:\n",
    "            key = stdname.split(\"_\")[-1]\n",
    "            value = ds.cf.standard_names[stdname]\n",
    "            for measure in value:\n",
    "                if (\n",
    "                    set(ds[measure].dims) <= set(variable.dims)\n",
    "                    and measure != varname\n",
    "                ):\n",
    "                    cell_measures[key] = measure\n",
    "\n",
    "        if cell_measures:\n",
    "            # sets an attribute like \"area: areacello volume: volcello\"\n",
    "            variable.attrs[\"cell_measures\"] = \" \".join(\n",
    "                [f\"{k}: {v}\" for k, v in cell_measures.items()]\n",
    "            )\n",
    "        else:\n",
    "            variable.attrs.pop(\"cell_measures\", None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "152e1360",
   "metadata": {},
   "source": [
    "## MOM6\n",
    "\n",
    "Read grid and data variables for a MOM6 ocean model simulation\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd4714c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Open grid and data variables, then merge them together to one dataset\n",
    "grid = xr.open_dataset(MOM6_GRID_PATH, chunks={})\n",
    "ds = xr.open_dataset(\n",
    "    MOM6_DATA_URL,\n",
    "    chunks={\"time\": 1},\n",
    ")\n",
    "ds_mom6 = xr.merge([grid, ds], compat=\"override\")\n",
    "\n",
    "# Illustrate the equivalent of a curvilinear grid case,\n",
    "# where axes and coordinates are different\n",
    "axes = [\"xh\", \"xq\", \"yh\", \"yq\"]\n",
    "ds_mom6 = ds_mom6.drop_vars(axes)\n",
    "ds_mom6 = ds_mom6.assign_coords({axis: ds_mom6[axis] for axis in axes})\n",
    "ds_mom6 = ds_mom6.set_coords(\n",
    "    [\n",
    "        var\n",
    "        for var in ds_mom6.variables\n",
    "        for prefix in [\"geo\"]\n",
    "        if var.startswith(prefix)\n",
    "    ]\n",
    ")\n",
    "assign_coordinates_and_cell_measures(ds_mom6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5a37e1f",
   "metadata": {},
   "source": [
    "## CMIP6 Ocean\n",
    "\n",
    "Read a historical CMIP6 simulation from the ACCESS-OM2 climate modelling system.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0382b999",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use intake-esm to access data on Pangeo Cloud\n",
    "col = intake.open_esm_datastore(CMIP6_OCE_CATALOG)\n",
    "cat = col().search(**CMIP6_OCE_EXPERIMENT)\n",
    "\n",
    "ddict = cat.to_dataset_dict(\n",
    "    zarr_kwargs={\n",
    "        \"consolidated\": True,\n",
    "        \"decode_times\": True,\n",
    "        \"use_cftime\": True,\n",
    "    }\n",
    ")\n",
    "_, ds_cmip6_oce = ddict.popitem()\n",
    "assign_coordinates_and_cell_measures(ds_cmip6_oce)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02541ba8",
   "metadata": {},
   "source": [
    "## CMIP6 Ice\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1c47d9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_cmip6_ice = xr.open_dataset(CMIP6_ICE_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87d8fbd0",
   "metadata": {},
   "source": [
    "## NCEP\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74374c96",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_ncep = xr.tutorial.open_dataset(NCEP_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae1686f5",
   "metadata": {},
   "source": [
    "# Data processing and analysis\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7509a31",
   "metadata": {},
   "source": [
    "## Overview of cf-xarray\n",
    "\n",
    "cf-xarray uses Xarray's plugin interface, or \"accessors\", to provide extensive\n",
    "functionality on both Datasets and DataArrays under the `.cf` namespace.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01f99bda",
   "metadata": {},
   "source": [
    "### `.cf` is an entrypoint for cf-xarray functionality\n",
    "\n",
    "When `cf_xarray` is imported, the `cf` accessor is automatically added to any\n",
    "`xarray` object. cf-xarray is able to wrap most of Xarray's functions. The repr\n",
    "for `Dataset.cf` prints out a list of detected \"CF names\" and the corresponding\n",
    "dataset-specific variable names. cf-xarray parses the attributes associated with\n",
    "each variable in the Dataset to build these mappings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c38514f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# After `import cf_xarray`, the cf_xarray accessor has been added to the xarray object\n",
    "ds_mom6.cf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35dc79a2",
   "metadata": {},
   "source": [
    "### Use CF metadata in standard xarray methods\n",
    "\n",
    "With standard `xarray` syntax, one would specify the variable names on the\n",
    "right-hand side of the mappings printed above. With cf_xarray, one can instead\n",
    "use the standardized \"CF names\" on the left-hand side.\n",
    "\n",
    "For example, the next two cells show two ways of calculating an average along\n",
    "the vertical dimension\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eac283a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# xarray way:\n",
    "ds_mom6.mean([\"z_i\", \"z_l\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2d99c3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# cf_xarray way:\n",
    "# By calling .cf.mean we can provide \"Z\" which is then rewritten to [\"z_i\", \"z_l\"]\n",
    "# This statement is entirely equivalent to ds_mom6.mean([\"z_i\", \"z_l\"])\n",
    "ds_mom6.cf.mean(\"Z\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d7c5e66",
   "metadata": {},
   "source": [
    "`cf_xarray` knows that `z_i` and `z_l` correspond `Z` axes because the two\n",
    "variables have a CF-compliant attribute `cartesian_axis: Z`. A full list of\n",
    "criteria used by cf-xarray is documented\n",
    "[here](https://cf-xarray.readthedocs.io/en/latest/criteria.html).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "342d8146",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for var_name in [\"z_i\", \"z_l\"]:\n",
    "    print(f\"{var_name}: {ds_mom6[var_name].attrs['cartesian_axis']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "402619a1",
   "metadata": {},
   "source": [
    "### Dictionaries mapping CF keys to variable names\n",
    "\n",
    "The example object contains variables lying on staggered grids. Therefore, a CF\n",
    "key can be associated with multiple variables. `cf_xarray` provides several\n",
    "properties that return dictionaries mapping CF keys to lists of variable names,\n",
    "such as:\n",
    "\n",
    "- `.cf.axes`\n",
    "- `.cf.coordinates`\n",
    "- `.cf.cell_measures`\n",
    "- `.cf.standard_names`\n",
    "- `.cf.bounds`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c230c06",
   "metadata": {},
   "outputs": [],
   "source": [
    "# maps \"axes\" to variable names\n",
    "ds_mom6.cf.axes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88432c71",
   "metadata": {},
   "outputs": [],
   "source": [
    "# maps CF standard name to variable name\n",
    "ds.cf.standard_names"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3958ee34",
   "metadata": {},
   "source": [
    "### Dealing with incomplete metadata\n",
    "\n",
    "The usefulness of cf-xarray fully depends on the amount of CF-compliant metadata\n",
    "present in a dataset. Although many datasets have incomplete metadata, in most\n",
    "cases one can guess appropriate metadata by looking at variable names.\n",
    "\n",
    "We can use `Dataset.cf.guess_coord_axis` to identify, guess, and add missing CF\n",
    "metadata for \"axes\" (\"X\", \"Y\", \"Z\", \"T\") and \"coordinates\" (\"latitude\",\n",
    "\"longitude\", \"time\"). It does so by using regular expressions to parse variable\n",
    "names and make reasonable guesses.\n",
    "\n",
    "We also demonstrate the `verbose` mode so that the user can double check\n",
    "cf-xarray's inferences.\n",
    "\n",
    "First, note that no `X` or `Y` axes variables have been detected in the current\n",
    "dataset:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "743908a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The \"axes\" property maps axes names 'X', 'Y', 'Z', 'T' to variable names in the dataset\n",
    "# here the metadata only identify 'Z' and 'T'\n",
    "ds_mom6.cf.axes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f5b8a68",
   "metadata": {},
   "source": [
    "Now we will ask `cf_xarray` to autoguess more axes:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01aa494a",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_mom6 = ds_mom6.cf.guess_coord_axis(verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "067d4441",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The `X` and `Y` axes variables that have been detected are sensible!\n",
    "ds_mom6.cf.axes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37b56ff8",
   "metadata": {},
   "source": [
    "### Indexing using CF keys\n",
    "\n",
    "CF metadata precisely describes the physical quantities being represented by all\n",
    "variables. More importantly, CF conventions also describe links between\n",
    "different variables in a dataset.\n",
    "\n",
    "Here we examine the \"sea floor depth\" variable. First, we pick it out using\n",
    "standard xarray syntax:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51fe7d33",
   "metadata": {},
   "outputs": [],
   "source": [
    "xr_da = ds_mom6[\"deptho\"]\n",
    "xr_da"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a57a153d",
   "metadata": {},
   "source": [
    "Note that, in the output above, the `cell_methods` attribute indicates that the\n",
    "`areacello` variable contains the appopriate \"cell area\" for this variable (see\n",
    "under \"Attributes\"). However, this variable is not associated with the DataArray\n",
    "(see under \"Coordinates\").\n",
    "\n",
    "Now we pick out the variable using the `cf` accessor and the appropriate\n",
    "standard name:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ce75df9",
   "metadata": {},
   "outputs": [],
   "source": [
    "cf_da = ds_mom6.cf[\"sea_floor_depth_below_geoid\"]\n",
    "cf_da"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bedf329f",
   "metadata": {},
   "source": [
    "Now notice that `areacello` is present under \"Coordinates\". This is because\n",
    "`cf_xarray` decodes CF metadata linking variables with each other (e.g.,\n",
    "`coordinates`, `cell_measures`, `ancillary_variables`).\n",
    "\n",
    "As opposed to `xr_da`, `cf_da` extracted in the previous cell contains all\n",
    "`cell_measures` associated with the variable extracted.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2c21b0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "additional_coords = set(cf_da.coords) - set(xr_da.coords)\n",
    "print(\"Cell measure extracted by cf_xarray:\", additional_coords)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf41fc54",
   "metadata": {},
   "source": [
    "### Automagic plotting\n",
    "\n",
    "`cf_xarray` automagically sets some optional keyword arguments for plotting\n",
    "functions. As opposed to `xarray`, in the example below `cf_xarray` assigns the\n",
    "appropriate coordinates to the plot axes (i.e., longitude and latitude).\n",
    "`cf_xarray` does so by parsing the `\"coordinates\"` attribute to identify the\n",
    "appropriate latitude and longitude variables:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5a8ef4e",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (xr_ax, cf_ax) = plt.subplots(1, 2, figsize=(12, 4))\n",
    "\n",
    "# left: xarray plot\n",
    "xr_da.plot(ax=xr_ax)\n",
    "xr_ax.set_title(\"xarray\")\n",
    "\n",
    "# right: cf_xarray plot\n",
    "cf_da.cf.plot(ax=cf_ax)\n",
    "cf_ax.set_title(\"cf_xarray\")\n",
    "\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50a26737",
   "metadata": {},
   "source": [
    "### CF keys expansion\n",
    "\n",
    "As mentioned above, the example dataset is characterized by multiple dimensions\n",
    "associated with the same spatial axes. Such information is decoded by\n",
    "`cf_xarray` and is used under the hood of wrapped functions. In the example\n",
    "below, the CF Axes keys (i.e., \"X\", \"Y\", and \"Z\") are expanded and multiple\n",
    "dimensions are sliced at once:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "713cc436",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_mom6_sliced = ds_mom6.cf.isel(\n",
    "    X=slice(10), Y=slice(10), Z=slice(10), T=slice(10)\n",
    ")\n",
    "print(\"Original dataset sizes:\", dict(ds_mom6.sizes))\n",
    "print(\"  Sliced dataset sizes:\", dict(ds_mom6_sliced.sizes))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba758631",
   "metadata": {},
   "source": [
    "## Use case examples\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2a6aa2b",
   "metadata": {},
   "source": [
    "### Seamlessly extract statistics from CF-compliant datasets\n",
    "\n",
    "`cf_xarray` allows one to use the same code on a wide variety of CF-compliant\n",
    "objects that each has their own nomenclature.\n",
    "\n",
    "There are two approaches to leveraging cf-xarray in applications.\n",
    "\n",
    "#### Access variables through the `.cf` interface\n",
    "\n",
    "In the example below, we define a function that uses many `cf_xarray` features,\n",
    "then we apply to objects with different dimension and coordinate names. All\n",
    "cf_xarray functionality is accessed using the `.cf` accessor\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7d18744",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_top_10m_temp_anomaly(ds):\n",
    "    \"\"\"\n",
    "    Compute the volume weighted temperature anomaly from the climatology.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    ds: xarray.Dataset\n",
    "        Dataset to analyze\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    DataArray\n",
    "    \"\"\"\n",
    "\n",
    "    # Compute and plot line\n",
    "    with xr.set_options(keep_attrs=True):\n",
    "        # Extract ocean potential temperature\n",
    "        da = ds.cf[\"sea_water_potential_temperature\"]\n",
    "        # Fill missing cell volumes with zeros\n",
    "        da = da.cf.assign_coords(volume=da.cf.coords[\"volume\"].fillna(0))\n",
    "        # Select temperature in the top 10m in 2003\n",
    "        da = da.cf.sel(T=\"2003\", Z=slice(0, 10))\n",
    "        # Compute volume-weighted mean temperature\n",
    "        da = da.cf.weighted(\"volume\").mean([\"X\", \"Y\", \"Z\"])\n",
    "        # Calculate an anomaly relative to the time mean\n",
    "        da = da - da.cf.mean(\"T\")\n",
    "\n",
    "    # Update metadata\n",
    "    da.attrs[\"standard_name\"] += \"_anomaly\"\n",
    "    da.attrs[\"long_name\"] += \" Anomaly\"\n",
    "\n",
    "    return da.squeeze(drop=True)\n",
    "\n",
    "\n",
    "# Run the function on two different datasets and compare the results\n",
    "compute_top_10m_temp_anomaly(ds_mom6).cf.plot(label=\"ds_mom6\")\n",
    "compute_top_10m_temp_anomaly(ds_cmip6_oce).cf.plot(label=\"ds_cmip6_oce\")\n",
    "_ = plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61555fe3",
   "metadata": {},
   "source": [
    "#### Standardize datasets using cf_xarray\n",
    "\n",
    "Alternatively, `cf_xarray` provides utility functions to rename variables and\n",
    "dimensions in one object to match another object. Matching variables/dimensions\n",
    "are determined using CF metadata. One might choose to use this approach of\n",
    "standardizing datasets prior to passing them through a data processing pipeline.\n",
    "\n",
    "Here we illustrate the `rename_like`\n",
    "[feature](https://cf-xarray.readthedocs.io/en/latest/generated/xarray.DataArray.cf.rename_like.html#xarray.DataArray.cf.rename_like).\n",
    "cf_xarray also supports renaming datasets through `.cf.rename`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47235891",
   "metadata": {},
   "outputs": [],
   "source": [
    "mom6_da = ds_mom6.cf[\"sea_water_potential_temperature\"]\n",
    "cmip6_da = ds_cmip6_oce.cf[\"sea_water_potential_temperature\"]\n",
    "renamed_mom6_da = mom6_da.cf.rename_like(cmip6_da)\n",
    "print(\"        MOM6 dimensions:\", mom6_da.dims)\n",
    "print(\"       CMIP6 dimensions:\", cmip6_da.dims)\n",
    "print(\"renamed MOM6 dimensions:\", renamed_mom6_da.dims)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de225520",
   "metadata": {},
   "source": [
    "### Regridding with xESMF\n",
    "\n",
    "`cf-xarray` is used by [xESMF](https://pangeo-xesmf.readthedocs.io), a\n",
    "regridding package wrapping the powerful Fortran libray \"ESMF\". This example\n",
    "will regrid sea ice concentration data extracted from the CCCma-CanESM5 model, a\n",
    "participant in the CMIP6 experiment.\n",
    "\n",
    "Our original sea ice data is on a tripolar grid. The target grid is a regular\n",
    "grid used by the NCEP reanalysis dataset that ships with Xarray. This regridding\n",
    "problem requires providing grid cell corners to xESMF in a specific\n",
    "CF-compatible format. Here we illustrate utility functions provided by cf_xarray\n",
    "to make this task easy and convenient.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3657c286",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's look at the grid shape itself and the data for one time step\n",
    "fig, axs = plt.subplots(ncols=2, figsize=(12, 4))\n",
    "\n",
    "# Notice how with .cf we will use the same keyword arguments\n",
    "# Although here we explicitely pass the coordinate standard names for the plot axes,\n",
    "# cf_xarray default scatter plot would produce the same results.\n",
    "scatter_kwargs = dict(x=\"longitude\", y=\"latitude\", s=0.1)\n",
    "\n",
    "# CMIP6: Input grid\n",
    "ds_cmip6_ice.cf.plot.scatter(**scatter_kwargs, ax=axs[0])\n",
    "axs[0].set_title(\n",
    "    \"The input horizontal grid points as seen on a lat/lon map.\"\n",
    "    \"\\nOnly the northern hemisphere is shown.\"\n",
    ")\n",
    "axs[0].set_ylim(0, 90)\n",
    "\n",
    "# NCEP: Target grid\n",
    "ds_ncep.cf.plot.scatter(**scatter_kwargs, ax=axs[1])\n",
    "axs[1].set_title(\"The target horizontal grid points\")\n",
    "axs[1].set_ylim(0, 90)\n",
    "\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84c05828",
   "metadata": {},
   "source": [
    "We will regrid the sea ice data using the \"conservative\" method. Please refer to\n",
    "the xESMF documentation for details about the\n",
    "[different algorithms](https://xesmf.readthedocs.io/en/latest/notebooks/Compare_algorithms.html).\n",
    "The important information here is that the \"conservative\" regridding algorithms\n",
    "requires the grid points coordinates, but also the grid _corners_ coordinates.\n",
    "While the target grid doesn't provide them, they are easily computable with the\n",
    "help of `cf_xarray`:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42b5d7fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make a reasonable guess of the bounds of the spatial coordinates:\n",
    "ds_ncep = ds_ncep.cf.add_bounds([\"latitude\", \"longitude\"])\n",
    "ds_ncep"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a834a69",
   "metadata": {},
   "source": [
    "This was easy since the grid is regular (i.e., latitude and longitude are 1D).\n",
    "Inferring bounds of 2D grids is\n",
    "[not yet supported](https://github.com/xarray-contrib/cf-xarray/issues/163) by\n",
    "cf-xarray.\n",
    "\n",
    "Luckily, our sea ice data includes the corner coordinates in the\n",
    "`vertices_latitude` and `vertices_longitude` variables. However, xESMF expects a\n",
    "format that is different from the CF convention followed here. No worries,\n",
    "`cf_xarray` has an helper method just for this:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "056a84b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the bounds variable and convert them to \"vertices\" format\n",
    "# Order=None, means that we do not know if the bounds are\n",
    "# listed clockwise or counterclockwise, so we ask cf_xarray to try both.\n",
    "lat_corners = cfxr.bounds_to_vertices(\n",
    "    ds_cmip6_ice.vertices_latitude, \"vertices\", order=None\n",
    ")\n",
    "lon_corners = cfxr.bounds_to_vertices(\n",
    "    ds_cmip6_ice.vertices_longitude, \"vertices\", order=None\n",
    ")\n",
    "\n",
    "# We are using special variable names \"lon_b\" and \"lat_b\" for easier detection by xESMF\n",
    "ds_in = ds_cmip6_ice.assign(lon_b=lon_corners, lat_b=lat_corners)\n",
    "ds_in"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bc7c77b",
   "metadata": {},
   "source": [
    "Finally, the regridding is performed with xESMF. Under the hood, it uses\n",
    "cf-xarray to get the coordinates and their bounds, so we do not need to worry\n",
    "about renaming. The only exception is the input grid's corners, where we used\n",
    "hardcoded variable names because of the difference between xESMF's and CF's\n",
    "syntaxes for 2D grid bounds.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd0d67ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Regrid\n",
    "regridder = xe.Regridder(ds_in, ds_ncep, \"conservative\")\n",
    "sic_reg = regridder(ds_in.cf[\"sea_ice_area_fraction\"])\n",
    "\n",
    "# Plot the results\n",
    "sic_reg.isel(time=0).plot()\n",
    "_ = plt.title(\"Regridded sic data (Jan 2020)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83149fcf-f999-41cc-b4cf-4f466394d344",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "This notebook was a quick walkthrough of a few core cf-xarray features that\n",
    "enable scaling analysis pipelines _across_ datasets. It is fairly common for\n",
    "datasets to not have consistent terminology, and be imperfectly tagged with CF\n",
    "attributes. cf_xaray both allows you to leverage the presence of attributes, and\n",
    "provides utility functions to quickly fix imperfect tagging. For more see the\n",
    "[documentation](https://cf-xarray.readthedocs.io/en/latest/), specifically the\n",
    "[introductory notebook](https://cf-xarray.readthedocs.io/en/latest/examples/introduction.html).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69f6124d-8a59-42b3-ba48-b4aee3501c93",
   "metadata": {},
   "source": [
    "# References\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8513c110-e1d3-4da5-b6fd-6c4ae3e101af",
   "metadata": {},
   "source": [
    "Adcroft, Alistair, Whit G Anderson, V Balaji, Chris Blanton, Mitchell Bushuk,\n",
    "Carolina O Dufour, John P Dunne, Stephen M Griffies, Robert Hallberg, Matthew J\n",
    "Harrison, Isaac M Held, Malte Jansen, Jasmin G John, John P Krasting, Amy R\n",
    "Langenhorst, Sonya Legg, Zhi Liang, Colleen McHugh, Aparna Radhakrishnan,\n",
    "Brandon G Reichl, Anthony Rosati, Bonita L Samuels, Andrew Shao, Ronald J\n",
    "Stouffer, Michael Winton, Andrew T Wittenberg, Baoqiang Xiang, Niki Zadeh, and\n",
    "Rong Zhang, October 2019: The GFDL Global Ocean and Sea Ice Model OM4.0: Model\n",
    "Description and Simulation Features. Journal of Advances in Modeling Earth\n",
    "Systems, 11(10), DOI:10.1029/2019MS001726.\n",
    "\n",
    "Cherian, Deepak, Almansi, Mattia, Bourgault, Pascal, keewis, Kent, Julia, Thyng,\n",
    "Kristen, … Chauhan, Subhendra Singh. (2021, May 11). xarray-contrib/cf-xarray:\n",
    "(Version v0.5.2). Zenodo. http://doi.org/10.5281/zenodo.4749736\n",
    "\n",
    "Hoyer, Stephan, Hamman, Joe, Roos, Maximilian, keewis, Cherian, Deepak,\n",
    "Fitzgerald, Clark … Bovy, Benoit. (2021, May 19). pydata/xarray: v0.18.2\n",
    "(Version v0.18.2). Zenodo. http://doi.org/10.5281/zenodo.4774304\n",
    "\n",
    "Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L.,\n",
    "Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Chelliah, M., Ebisuzaki,\n",
    "W., Higgins, W., Janowiak, J., Mo, K. C., Ropelewski, C., Wang, J., Leetmaa, A.,\n",
    "Reynolds, R., Jenne, R., & Joseph, D. (1996). The NCEP/NCAR 40-Year Reanalysis\n",
    "Project, Bulletin of the American Meteorological Society, 77(3), 437-472.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": "1",
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "312px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}