Wednesday, December 20, 2017

Meteo #25 - Opening NetCDF Data With GrADS Control File

NetCDF (or 'nc') is one of the most popular data formats in Geoscience universe, thus no wonder so many kinds of global dataset are distributed in nc format, including climate and meteorological data. While most of it could be easily opened and utilized by various data processing software, sometimes .. yes, sometimes s**t happens. This time, I would like to share how to open an nc data by using GrADS descriptor/control file (CTL).

Firstly, one may ask, why do you use GrADS to open an nc data? For more experienced people, the real question is probably: why the hell do you use GrADS CTL to open an nc??

Here are few reasons why use GrADS to open an nc:
  1. Built-in nc library. That means, users don't need to go through the troublesome NetCDF installation, just to open an nc data. What you need is just installing GrADS or OpenGrADS which is relatively much easier to do. Pre-installed nc binaries and libraries though, are recommended for some reasons which will be explain later in this post.
  2. Light, fast and free. This is an all-time classic reason. And yes, of course you can open an nc file using Matlab, ArcGIS or any sophisticated software but they are mostly resource-hogs and expensive.
  3. Efficient. Yup, you can also write a program (e.g. with FORTRAN) to open an nc file. However, using GrADS will save much of your time because you don't need to write, compile, or debug program as you do with programming languages.
Then why use GrADS CTL file to open an nc while you still can easily open it using GrADS 'sdfopen' or even 'xdfopen' commands?

The funny thing is, not all nc data could be open using above commands. You may sometimes encounter this annoying error message:

gadsdf: SDF file has no discernable X coordinate

It happens mostly because the nc data doesn't conform to the COARDS conventions (just Google, in case you've never heard of it). A 'good' nc data normally has a header or metadata section (hence called SDF or Self-Describing File) which contains complete information about the dataset, for example: dimensions (x,y,z,t), variables etc. GrADS 'sdfopen' or 'xdfopen' commands basically need to read that header in order to open an nc data, and when they found any incomplete information in a 'bad' nc data, that annoying message appears. While most of nc files follow COARDS conventions for storing the data, some of them may probably not, and that's exactly the reason why GrADS CTL file is very useful.

The catch is, instead of nc original header, the GrADS CTL file will be used as the 'replacement' header for opening the nc data. While it requires more efforts than just using 'sdfopen' or 'xdfopen', the user will have more control to the data by overriding the original descriptor.

Before opening an nc with GrADS CTL, a user needs to know two important information from the data:
  1. Dimensions. Since an nc data has gridded format, the user at least needs to know the grid numbers in x, y, z direction, as well as t (time) or e (ensemble), if they are needed. 
  2. Variable names and their order/structure. Since gridded data uses matrix-like sequential order to differentiate dimensions and variables, the user should know how the data is ordered in the file.
If you've already known those two information, that will be great. Otherwise, you'll need to use a tool in order to get information of nc's dimensions and variables. One of the most well-known tools to do such work is ncdump, which will be installed automatically when you install NetCDF on your system. That's why it's recommended to have pre-installed NetCDF binaries and libraries before you work with such data files.

Ok. Assume you don't know anything about the nc file you would like to open, here are the steps to open an nc data using GrADS CTL. For this example, I use an nc data which was an output from WRF-CHEM model simulation on Linux OS, with pre-installed NetCDF, and OpenGrADS ver 2.1.0.

1. Getting Header Information 

On the linux shell, make a symbolic link to the nc file you'd like to open. This is just to make things easier and not an obligatory though, thus you can omit this step if you want. For this example, I make a symbolic link of an nc file (wrfout_d01_2017-05-01_00:00:00) through a file named testnc, because the original file name is too long.

$ ln -sf wrfout_d01_2017-05-01_00:00:00 testnc

Execute ncdump to get the header of the nc file (or the link file).

$ ncdump -h testnc

Once the header is shown, scroll to the uppermost part of it. You may find something like this:

       dimensions:
        Time = UNLIMITED ; // (393 currently)
        DateStrLen = 19 ;
        west_east = 99 ;
        south_north = 109 ;
        bottom_top = 29 ;
        bottom_top_stag = 30 ;
        soil_layers_stag = 4 ;
        west_east_stag = 100 ;
        south_north_stag = 110 ;
        dust_erosion_dimension = 3 ;
        klevs_for_dust = 1 ;
        bio_emissions_dimension_stag = 41 ;
        klevs_for_dvel = 1 ;
        vprm_vgcls = 8 ;

Those are the dimensions we need to know. From the example, we found that the time dimension is 393, grid number in x direction is 99 (west_east), y direction is 109 (south_north) and z direction is 29 (bottom_top). The dimension header could be different for any nc file, thus you should at least know some information about the nc data before you open it. Otherwise, you could use your common sense to guess the dimensions, e.g. x usually related to west-east or longitude direction etc.

Next, scroll down the header to get information about the variables you would like to access. For this example, I would like to access a variable named BC2 (Hydrophilic Black Carbon). It may look like this:

         float BC2(Time, bottom_top, south_north, west_east) ;
                BC2:FieldType = 104 ;
                BC2:MemoryOrder = "XYZ" ;
                BC2:description = "Hydrophilic Black Carbon" ;
                BC2:units = "ug/kg-dryair" ;
                BC2:stagger = "" ;
                BC2:coordinates = "XLONG XLAT XTIME" ;

Pay attention to the top part of variable header. It shows the grid order of the variable: Time, bottom_top, south_north, west_east. In accordance with the dimensions, the grid order will be like this: t, z, y, x. You may want to make notes about each variables and their grid orders because you will need to put them into the GrADS CTL file later. Anyway, you don't need to list all variables in the nc data if you only want to access specific variables only (e.g. 5 out of 100 variables).

2. Making GrADS CTL File

Create a new CTL file with your favorite text editor (notepad, vi, vim, gedit etc.). The contents of CTL file to open an nc are almost no different with normal CTL files. For my example, it looks like this:

DSET ^testnc
TITLE This is experimental
DTYPE netcdf
UNDEF 99999.0
XDEF 99 LINEAR 1 1
YDEF 109 LINEAR 1 1
ZDEF 29 LINEAR 1 1
TDEF 393 LINEAR 00Z01MAY2017 1hr
VARS 5
DUSTLOAD_5=>dustload5 0 t,y,x Total dust loading
BC1=>bc1 29 t,z,y,x Hydrophobic Black Carbon (ug/kg-dryair)
BC2=>bc2 29 t,z,y,x Hydrophilic Black Carbon (ug/kg-dryair)
OC1=>oc1 29 t,z,y,x Hydrophobic Organic Carbon (ug/kg-dryair)
OC2=>oc2 29 t,z,y,x Hydrophilic Organic Carbon (ug/kg-dryair)
ENDVARS

DSET indicates the path of nc file (or its link) you would like to open. TITLE is just a title, you can write anything. DTYPE indicates the file format, you should put 'netcdf' for it. UNDEF indicates undefined value for each variable, if you don't know, just put -99.9e8 or any 'extreme' value as you like.

XDEF, YDEF, ZDEF and TDEF indicate the nc file dimensions you get from ncdump, as well as the first grid and its spacing in world coordinates. If you don't know the first grid or grid spacing of space dimensions (XDEF, YDEF and ZDEF), just put LINEAR 1 1 behind each entries. Otherwise, you should put the first grid coordinates and their spacing for each entries in world coordinates. For example:

XDEF 99 LINEAR 67.732000 0.311591836734694
   
For TDEF, you should define the first time stamp and time interval of each data. For my example, the first data is at 00 UTC of  01MAY2017, with 1 hour interval.

VARS indicates the number of variables you'd like to access in the data. In my example, I would like to access only 5 out of hundreds variables in the nc file.

Next entries, are the ones which make this nc CTL different from the normal GrADS CTL. You should list the variables you want to access with this syntax:

[VAR_NAME_IN_NC]=>[VAR_NAME_IN_GrADS] [NUMBER_OF_Z_LEVELS] [GRID_ORDER] [VAR_DESCRIPTION]

For example, I previously would like to access Hydrophilic Black Carbon in the nc data. You should firstly list its original name (which is BC2 in the nc data), then put '=>' sign before putting the variable name in GrADS, which could be any name you like (for my case, bc2). You can even put the same name for the GrADS variable name if you want.

Next, you should define the number of Z levels for the variable. Since BC2 is 4D variable data (with x,y,z, and t), the value should be 29 which is the same with Z dimension of the data. For other cases, 3D data for example (in this case, DUSTLOAD_5), which only has data with a single Z level, the value should be 0.

Finally, you should put the grid order as what you've found in the ncdump result (see first steps). For example, the grid order for variable BC2 from ncdump is: time, bottom_top, south_north, west_east. Then you should put it like this: t,z,y,x. Hence, in the end, it should be looked like this:

BC2=>bc2 29 t,z,y,x Hydrophilic Black Carbon (ug/kg-dryair)
DUSTLOAD_5=>dustload5 0 t,y,x Total dust loading

The rest (variable description) is free, you can write anything you like to describe the variable. Don't forget to put ENDVARS at the end of the file.

Save the CTL file with any name you like, for example: wrfout1.ctl.

3. Opening The NC Data

This is the last step and it's definitely no different with the normal way to open binary data with GrADS.

ga-> open wrfout1.ctl
Scanning description file:  wrfout1.ctl
Data file wrfout1 is open as file 1
LON set to 67.732 98.268
LAT set to 6.049 36.867
LEV set to 1 1
Time values set: 2017:5:1:0 2017:5:1:0
E set to 1 1
ga-> set t 100
Time values set: 2017:5:5:3 2017:5:5:3
ga-> d bc2
Contouring: 0.001 to 0.014 interval 0.001

And here's the result of my example (with x and y using real world coordinates):



From this point on, you can do anything with the nc file. You can add as many variable as you like in the CTL file or even save the variable data into a binary file for any further uses :-)

No comments:

Post a Comment