FILENAME VERIFICATION WITH PYTHON AND JSON

In order to build and maintain a robust operational system, which relies on many datasets, file naming becomes an important factor in identifying and describing the data contained within.

In the case of climate science, this could involve descriptors such as data origin, bias adjustment, variable type, start/end dates, accumulated or instantaneous  measurements, grid resolutions etc.

This example uses a JSON (JavaScript Open Notation) file as a lookup table. JSON files are extremely lightweight and can be imported easily by most modern programming languages. In Python, these are imported as a dictionary data type, which behaves like a list of objects. In this example, it will technically be a ‘nested’ dictionary, as it has multiple levels.

The JSON lookup table shown below contains all the allowed elements for the projects filename structure, including the associated ‘long names’ for each item (more on this later). It also contains the position of the filename element and it’s character length.


 

JSON Lookup Table

As you can see, each dictionary key has a nested dictionary within it. The key corresponds to the filename element.  Here is a test example filename relevant to this project:

H_ERA5_ECMW_T639_GHI_0000m_Euro_025d_S200001010000_E200001012300_ACC_MAP_01h_NA-_noc_org_NA_NA—_NA—_NA—.nc

So referencing the JSON table, by eye it is possible to identify that the file contains historical Global Horizontal Irradiance ERA5 data, originating from ECMWF, amongst other things. This is all well and good, but what if you need to check hundreds of similar files automatically? This is where Python can be used to create some functions that call the JSON table and check the integrity of the file name string.

 

Import packages:


 

Set some font colours for printing to terminal:


 

Load JSON file as a dictionary:

 

A simple function to print the file name structure in the correct order, for reference purposes:


 

Calling this function will output the following:

 

Another simple function to print all the possible filename elements for reference


 

Calling this function will output the following:


 

 

Finally, this function will check the filename string against the JSON table and output if there are the correct amount of elements present in the string. This takes one argument, your filename as string (fname) as input.


 

Taking the example filename as a string and passing it to the function, gives the following output:


 

The motive behind creating these functions, is they can then be called from another Python script running on a system, be that a local machine, virtual machine, server, HPC etc.

This can be achieved simply by adding the following to the top of your script (the functions are saved in a python file called ‘filename_utilities.py’):


 

So what’s the use in having all the long names in the JSON? This comes in handy when we need to produce data in a human readable format, such as writing metadata and comments. The next blog will focus on parsing the JSON file to automatically generate metadata as comments into an outputted CSV file.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.