Calculating annual precipitation anomalies with R & stars

```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE,warning = F, message = F) ``` The [`stars`](https://r-spatial.github.io/stars/) library is a relatively new R framework for spatiotemporal arrays. In this demo, we will unpack a netcdf file to calculate the annual precipitation anomalies for southeastern Australia. This can also be effectively calculated using data frames, however my motivation here is to demonstrate how the `stars` array based workflow achieve this calculation faster and use far less memory. ### Load the libraries ```{r} library(tidyverse) # data wrangler library(stars) # the *star* of this tutorial library(lubridate) # date wrangler library(googledrive) # download data library(scico) # sci-viz color palettes ``` ### Download the data Here is a file with monthly averages for total precip, 2-m air temp, and 2-m dewpoint temperature that I got from the [Copernicus Data Store](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land-monthly-means?tab=form){:target="_blank"} for the ERA5-Land product.\ fyi, the netcdf file is about 44 Mb.\ Note that I am using `stars::read_stars()` instead of `stars::read_ncdf` ```{r eval=FALSE} library(googledrive) temp <- tempfile(fileext = ".nc") dl <- googledrive::drive_download( as_id("1Fo5F--2tfWWZfhTKBOatfM-bDs2stjhB"), path = temp, overwrite = TRUE) raw <- stars::read_stars(temp) ``` ```{r message=FALSE, warning=FALSE, echo=FALSE} raw <- stars::read_stars("/home/sami/srifai@gmail.com/work/research/data_general/proc_data_Oz_fire_recovery/era5-totprecip-t2m-d2m-SEOZ.nc", make_units = F) ``` Get summary of the raw file: ```{r} raw ``` ### Convert to mm/month Following the [ERA5-Land documentation](https://confluence.ecmwf.int/display/CKB/ERA5-Land%3A+data+documentation#ERA5Land:datadocumentation-accumulationsAccumulations) for 'accumulation' variables: > \ > The accumulations in synoptic monthly means (stream=mnth) have units that include "variable_units per forecast_step hours". So for accumulations in this stream: > > The hydrological parameters are in units of "m of water equivalent per forecast_step hours" and so they should be multiplied by 1000 to convert to kgm-2 per forecast_step hours or mm per forecast_step hours. So if I understood that correctly, we will multiply by 24 hours, and 1000 to get mm per month. ```{r} dp <- raw['tp']*(24*1000) ``` ### Aggregate from monthly to yearly sum ```{r} dp_y30 = aggregate(dp %>% filter(time>=ymd("1981-01-01",tz='UTC')) %>% filter(time% filter(time>=ymd("1981-01-01",tz='UTC')) %>% filter(time% st_set_dimensions(., which = 3, values=1981:2020, names = 'year') ``` ### Plot the annual rainfall anomalies for southeastern Australia ```{r} plot(dp_anom, join_zlim = T, breaks = 'equal', col=scico::scico(n=21, palette='roma')) ``` Ok, there are (at least) a couple of things here to nitpick. Should I have calculated the anomaly by some sort of hydraulic year instead of calendar year? Yes, probably. It is entirely possible to do this with `stars`, but I wanted to keep this example short and simple. Should I have retained the units from the original netcdf? Maybe, but I still drop them out of habit because I haven't worked out how to (1) update them along the workflow, and (2) prevent them from breaking random functions. Maybe someday... ```{r} sessionInfo() ```