SURF-NEMO Workflow Overview¶

Workflow Overview¶

The schematic work-flow diagram in Fig. 4.1 outlines the key stages involved in the SURF-NEMO numerical platform. In the latest version, the workflow consists of the following steps:

Model Configuration:
In this step, the user specifies the values of the input parameters for the specific dowscaling experiment, including preprocessing (such as data sources, interpolation methods, etc.), simulation (such as time steps, output frequency, subgrid scale parameterizations, etc.), and postprocessing settings (visualization and data analysis).
Preprocessing (Automated):
This automated phase handles the downloading and preparation of the input datasets required for the simulation, such as bathymetry, coastline, atmospheric, and ocean data.
- Child Meshmask Generation: Generates the Meshmask for the nested model ensuring an accurate representation of the region of interest,
- Input Data Regridding: Remaps the input datasets like bottom topography, atmospheric and tidal forcing, and initial and lateral open boundary conditions on the child model grid.
Simulation (Automated):
In this phase, the NEMO ocean model is executed using the configured parameters. This automated step generates high-resolution fields that provide detailed descriptions of ocean dynamics in the selected region.
Post-Processing: (Automated):
After the simulation is complete, post-processing is performed to visualize and analyze the model outputs. Various options for post-processing are available:
- Visualize Input, Regridded, and Output Datasets: Explore the model’s behavior at each stage by visualizing the input, regridded, and final model output datasets.
- Parent/Child Model Comparisons: Analyzing differences between the parent (coarse-resolution) and child (high-resolution) model outputs.
- Validation with Observational Data: Comparing the simulation results with in-situ measurements or satellite datasets to validate accuracy.
- Data Conversion: Transforming the output datasets into different formats for further analysis, external use, or integration with other tools.

../assets/img/diagram_workflow_SURF1.2.png — Figure 4.1. Work-flow of the Relocatable SURF-NEMO platform.

The graphical calling function flow, as shown in Figure 4.2, illustrates the complete sequence of steps executed by the program from start to finish. This step-by-step representation highlights the key stages the program passes through during its execution, ensuring users can understand the overall workflow. The process begins with child Meshmask generation, followed by input data remapping, progresses through the simulation phase, and concludes with visualization and data analysis.

../assets/img/diagram_funcflow_SURF1.2.png — Figure 4.2. Graphical calling function flow of the Relocatable SURF-NEMO platform.

Many of the computational tasks in the workflow are interdependent, meaning they rely on each other to proceed. Figure 4.3 illustrates the dependency flow graph of these macro-tasks. Each node (labeled A through F) represents a distinct macro-task, and the solid edges between the nodes indicate data dependencies between them.

For example, node A serves as the starting point and branches out to nodes B, C, and D. Node E can only begin once all preceding tasks (B, C, and D) are completed. This flow ensures that tasks are executed in the proper order, maintaining the integrity of the overall process.

../assets/img/workflow_dependency.png — Figure 4.3. Dependency flow graph of macro-tasks.

Model Configuration¶

Each macro-task begins by initializing the input model parameters, which are loaded from the configuration files setParFree.json and setParFixed.json. These files contain user-free and fixed parameters, as detailed in Chapter 5, 6 and Appendix A.

During this phase, the procedures read_inJsonFree and read_inJsonFixed are executed to handle user-defined and fixed input parameters, respectively. These parameters are essential for running the NEMO model and associated pre- and post-processing tasks.

Additionally, the set_pathData and set_fileData procedures are called to establish the file paths and names required for the specific numerical simulation.

Child Meshmask Generation¶

After the model configuration phase, the child grid is generated. The NEMO model uses the Arakawa C grid for spatial discretization, where the state variables are defined on a staggered grid, as shown in Figure 4.4. In the C grid, scalar quantities such as temperature (T), salinity (S), pressure (p), and density ($\rho$) are defined at the center of each grid volume, while velocity components (zonal (u), meridional (v), and vertical (w)) are defined at the edges of the grid volumes, shifted by half a grid width in their respective directions.
During this phase, the following procedures are executed:

Generation of the child 2D mesh.
Interpolation of the source bathymetric dataset onto the generated child grid.
Generation of the child 3D meshmask.

../assets/img/ArakawaC.png — Figure 4.4. The staggered Arakawa C-grid used by NEMO ocean model.

Horizontal grid¶

The horizontal grid is generated using the NEMO code through a configuration called SURF-MESH. In this setup, SURF utilizes a rectangular (latitude-longitude) grid within a spherical coordinate system $\lambda,\varphi$, where $\lambda$ represents longitude and $\varphi$ represents latitude, both expressed in degrees.

To create the horizontal grid , the user specifies the number of grid points $n_{\lambda}$ and $n_{\varphi}$ in the zonal (longitude) and meridional (latitude) directions, respectively, along with the grid resolutions $\Delta \lambda$ and $\Delta \varphi$ (in degrees). Additionally, the grid's starting point $(\lambda,\varphi)_{1,1}$, which corresponds to the first row and column of the T grid, is defined.

The coordinates of the T points on the $\lambda\varphi$ plane are calculated as follows:

$$ \begin{equation} \begin{array}{ll} \lambda_{i,j} = \lambda_{11} + (i-1) \Delta \lambda \hspace{0.5cm} \mbox{with} \hspace{0.2cm} i=1.....n_\lambda \ \varphi_{i,j} = \varphi_{11} + (j-1) \Delta \varphi \hspace{0.5cm} \mbox{with} \hspace{0.2cm} j=1.....n_\varphi \end{array} \end{equation}$$

In this grid system, the u and v points (velocity components) are offset by half a grid cell in the zonal and/or meridional directions, as illustrated in Figure 4.4.

Bathymetry regridding¶

During this phase, the bathymetric dataset is interpolated onto the child grid, which is required to generate the 3D meshmask. The following tasks are carried out:

Accessing and downloading the source bathymetry and coastline datasets.
Manipulating the bathymetry dataset.
Spatial interpolation of the source bathymetry onto the child grid.

Access the bathymetry and coastline datasets¶

A procedure checks if the necessary input datasets are present in the experiment directory $PATH_IDEXP/data/data00/indata/. If any requested data is missing, the procedures downlCoastlineInfile and downlBathyInfile are executed to download the coastline and bathymetry datasets, respectively, from remote ftp or local data repositories as specified in the setParFree.json configuration file.

Manipulating and smoothing bathymetry¶

Before performing the spatial interpolation on the child grid, the source bathymetry dataset can be manipulated as specified in the setParFree.json configuration file. The available manipulation methods include:

Adding a constant value to the surface elevation for the entire nested region (e.g., adjusting the water level for an inland body of water like the Caspian Sea).
Setting maximum and minimum depth values (e.g., enforcing a minimum depth of 5 meters or a maximum depth of less than the actual depth).
Defining land/sea grid points based on the input coastline.
Setting maximum and minimum depths in specific sub-regions (e.g., to mask a particular area).
Smoothing the bathymetry using the Shapiro filter, a high-order horizontal filter that efficiently removes small-scale noise without affecting the physical structures of a field. A Shapiro filter of a 2N order of accuracy is applied to a variable based on the expression: $$\begin{equation} \label{eq:shapiro_filter} \tilde{w_i} = F^{2N}(w_i) = \left[ I + (-1)^{N-1} \frac{\delta^{2N}}{2^{2N}} \right] (w_i) = w_i + (-1)^{N-1} \frac{\delta^{2N} w_i}{2^{2N}} \end{equation}$$ where $\tilde{w_i}$ is the filtered value of variable $w$ at point $x_i$, $I$ is the identity operator and $\delta^{2N}$ is the even composition of the standard difference operator $\delta$ (Richtmyer, 1957). This filter is a discrete symmetric operator with a (2N + 1) point stencil. It acts as a low-pass filter that preserves the low-frequency content (i.e., largest wavelengths) and completely dissipates the high-frequency content (i.e., shortest wavelengths) from the original field.

Interpolation of bathymetric data¶

After manipulating the source bathymetry, spatial interpolation onto the child grid is performed using the Spherical Coordinate Remapping and Interpolation Package (SCRIP). Available interpolation methods are described in Section 4.4.4.

Meshmask and Vertical grid¶

Once the final bathymetry is obtained, the 3D Meshmask can be generated using the MESH module of the NEMO code (within the SURF-MESH configuration). SURF uses a geopotential z-coordinate vertical grid with partial bottom cell representation of the bathymetry. Once the bathymetry $z = H(\lambda, \varphi)$ and the number of levels $n_{z}$ are specified, the vertical location of the w- and t-levels (expressed in metres) is calculated using the following analytic expression: $$ \begin{equation} z(k) = h_{sur} - h_{0} k - h_{1} log [cosh (( k - h_{th}) h_{cr})] \end{equation}$$

In this equation, the parameters $h_{sur}$, $h_0$, $h_1$, $h_{th}$, and $h_{cr}$ need to be specified:

$h_{cr}$ represents the grid's stretching factor, controlling how the grid levels are distributed.
$h_{th}$ defines the approximate model level at which the maximum stretching occurs.

This expression enables stretched z-coordinate vertical levels to be defined, which are smoothly distributed along the water column, with appropriate thinning designed to better resolve the surface and intermediate layers. Through partial cell parameterization, the thickness of the bottom layer can vary as a function of the geographical location $(x,y)_{i,j}$ which allows a better representation of the real bathymetry.

../assets/img/meshOceTUVxy_FC.png — (A) Horizontal grid.

../assets/img/meshOceTz.png — (A) Horizontal grid.

Input data Regridding¶

Regridding, also known as remapping, is the process of transforming data from a source grid to a destination grid while preserving the integrity of the original data. In this section, we explain the spatial extrapolation and interpolation procedures used in SURF to remap input fields onto the child grid.
During this phase, atmospheric and tidal forcing, initial conditions, and lateral open boundary condition datasets are generated for the child grid. The steps involved in this process are:

Accessing and downloading the necessary input datasets
Rotating vector fields (if required)
Extrapolating the input datasets
Performing spatial interpolation of the source datasets onto the child grid
Generating the lateral open boundary condition datasets

Access and download of the input datasets¶

A procedure is executed to check if the necessary input datasets are available in the experiment directory $PATH_IDEXP/data/data00/indata/. If any of the required data are missing, the procedures downlAtmSrc, downlTideSrc, downlOceICSrc, and downlOceBCSrc are automatically executed. These procedures download the atmospheric and tidal forcing, initial conditions, and lateral boundary condition datasets, respectively, for the selected simulation period. The data are retrieved from remote or local repositories, as specified in the configuration file setParFree.json.

Rotation of horizontal velocity u, v¶

When the parent coarse resolution model is defined on a rotated or curvilinear grid (e.g., the global tripolar grid, Fig. 4.6(a)), an additional step is required to interpolate the horizontal velocity fields onto the child grid. In ocean models with a “distorted” grid, velocity vectors are aligned with the grid lines. In a staggered Arakawa C grid system, the components of the velocity field are defined at the cell edges (illustrated by the gray arrows in Fig. 4.6(b)).

A rotation of the velocity components in the latitudinal and longitudinal directions is necessary to convert the vectors from the local system $(x, y)$ to a geographical system $(x^{'}, y^{'})$. This transformation ensures that U represents the zonal component (W-E direction) and V represents the meridional component (S-N direction) of the velocity vector. To achieve this, the vectors must be rotated according to the following equations:

$$ \begin{equation} \begin{array}{ll} U^{'}(x_{t}^{'},y_{t}^{'}) = U(x_{t},y_{t}) \cdot \cos(\alpha_{t}) - V(x_{t},y_{t}) \cdot \sin(\alpha_{t}) \\ V^{'}(x_{t}^{'},y_{t}^{'}) = U(x_{t},y_{t}) \cdot \sin(\alpha_{t}) + V(x_{t},y_{t}) \cdot \cos(\alpha_{t}) \end{array} \end{equation} $$

For parent models that use rotated rectangular grids, the angle $\alpha_{t}$ remains nearly constant across the grid. However, for models with curvilinear tripolar grids (as seen in Fig. 4.6(a)), the angle $\alpha_{t}$ will vary within each grid cell.

Extrapolation method¶

Before performing interpolation, the Sea-Over-Land (SOL) procedure in SURF is used to provide ocean field values in areas near the coastline where the parent model's solutions are not defined. This method iteratively extrapolates the input coarse-resolution ocean fields — such as salinity, temperature, sea surface height, and currents — onto land grid points. This process ensure that ocean fields at child-grid points near the coast can accurately be defined through interpolation.

In addition to ocean fields, the SOL procedure is also applied to various input atmospheric fields using the atmospheric land-sea mask. This prevents contamination from land-based atmospheric data, ensuring accurate representation at sea points near the land-sea boundaries.

The source code for the SOL procedure can be found in the directory $PATH_SURFNEMO/utilities/extrapol/seaoverland.

The resulting files are stored in the directory $PATH_EXP/IDEXP/data/data00/extrapdata/.

Interpolation methods¶

The extrapolation procedure described in the previous section provides the necessary input data for the interpolator. These procedures are based on the Spherical Coordinate Remapping and Interpolation Package (SCRIP) code. SCRIP is a software package designed to compute weights for remapping and interpolating fields between grids in spherical coordinates. It is compatible with any grid on the surface of a sphere and currently supports the following five remapping options:

Conservative remapping: First- and second-order conservative remapping, as described by Jones (1999, Monthly Weather Review, 127, 2204-2210).
Bilinear interpolation: A slightly generalized version that uses a local bilinear approximation (applicable only to logically rectangular grids).
Bicubic interpolation: Similar to bilinear interpolation but for higher-order accuracy (also only for logically rectangular grids).
Distance-weighted averaging: The inverse-distance-weighted average of a user-specified number of nearest-neighbor values.
Particle remapping: A conservative particle-based (Monte-Carlo-like) remapping scheme.

The source code for SCRIP can be found in the directory $PATH_SURFNEMO/nemo/NEMOGCM/TOOLS/WEIGHTS.

Regridding can be broken down into two stages. The first stage involves generating an interpolation weight matrix, which describes how points in the source grid contribute to points in the destination grid. In the second stage, values from the source grid are multiplied by this interpolation weight matrix to produce the corresponding values on the destination grid.

The SCRIP spatial interpolation procedure is applied to all the input fields required for running the NEMO code, including bathymetry, atmospheric and tidal forcing, initial conditions, and lateral open boundary condition datasets. The resulting files are stored in the directory $PATH_EXP/IDEXP/data/data00/regridata/.

Lateral Open Boundary Condition¶

The lateral open boundary condition for the selected nested domain is implemented using the BDY module of NEMO. Two numerical algorithms are used to manage open boundary conditions, depending on the prognostic simulated variables:

Flather scheme (Oddo and Pinardi, 2008): Used for barotropic velocities and sea surface height.
Flow relaxation scheme (Engerdahl, 1995): Applied to baroclinic velocities and active tracers.

In SURF’s default configuration, external data is provided along straight open boundary lines, with the Flow Relaxation area limited to a single internal grid point.

Since the parent coarse-resolution ocean model only provides the total velocity field, the interpolated total velocity field on the child grid is split into its barotropic and baroclinic components. To ensure that total transport across the open boundary is preserved after interpolation, an integral constraint method is applied. The process involves the following steps:

Define the open boundary geometry: This step involves defining the coordinates for the open boundary on the T, U, and V grids using geometry_bdy procedure. The resulting data arrays are written to the coordinates.bdy.nc file.
Extract ocean fields at the open boundary: In this step, ocean fields including active tracers, sea surface height, and barotropic and baroclinic velocities are extract at the open boundary T-U-V grid points using the fields_bdy procedure.

The generated files are stored in the directory $PATH_EXP/IDEXP/data/data00/regridata/.

Integral Constraint at the open boundary¶

The downscaling process in SURF is designed to that the volume transport across the open boundary (OB) of the child model matches that across the corresponding section of the parent model. At the eastern and western boundaries, the U-points (zonal velocities) are imposed using the following condition: $$ \int_{y_2}^{y_1} \int_{-H_{child}}^{\eta_{child}} U_{child} dz\, dy = \int_{y_2}^{y_1} \int_{-H_{parent}}^{\eta_{parent}} U_{parent} dz\, dy $$

Here, $y_1, y_2$ represent the limits of the open boundary section. $\eta_{child}, H_{child}$ are the surface elevation and bathymetry at the boundary in the child model, while $\eta_{parent}, H_{parent}$ represent the surface elevation and bathymetry in the parent model. The terms $U_{parent},U_{child}$ denote the total zonal velocities in the parent and child models, respectively, normal to the western/eastern boundaries.

The corrected velocity component normal to the boundary, $U_{child}$, is calculated according to the method described in N. Pinardi et al. (2003): $$ \begin{equation} \begin{array}{ll} U_{child} (x,y,z,t) = U_{interp} - U_{correction} \end{array} \end{equation}$$ where $U_{interp}$ is the $U_{parent}$ interpolated on the child open boundary points and the velocity correction is given by $$ \begin{equation} \begin{array}{ll} U_{correction} = \frac{M_{interp} - M_{parent}}{S} \end{array} \end{equation}$$

In this equation:

$ M_{interp} = \int_{y_2}^{y_1} \int_{-H_{child}}^{\eta_{child}} U_{interp} dz dy $ is the volume transport of the interpolated zonal velocity across the open boundary in the child model.
$ M_{parent} = \int_{y_2}^{y_1} \int_{-H_{parent}}^{\eta_{parent}} U_{parent} dz dy $ is the volume transport of the parent model across the corresponding boundary section.
$ S = \int_{y_2}^{y_1} \int_{-H_{child}}^{\eta_{child}} dz dy $ is the area of the boundary section in the child model.

These conditions are similarly applied for the meridional velocity component (V-points) at the northern and southern boundaries. The integral constraint procedure ensures that the interpolation process preserves the net transport across the child model's lateral open boundary, preventing any artificial modifications to the total transport.

Simulation¶

Finally, the NEMO Fortran code is executed with the configured model parameters using mpiexec, allowing it to run across multiple processors for efficient parallel computation. During the execution, output files are continuously generated and updated at a fixed frequency. A logfile is also created to track the progress of the model run, providing step-by-step information on the current status.

Once the run is completed, all output files are stored in the experiment directory: $PATH_IDEXP/data/data00/outdata/ (refer to Appendix B for more details).

Post-processing¶

After the simulation is complete, post-processing can be performed to visualize and analyze the model results. Several post-processing options are available:

Visualize Input, Regridded, and Output Datasets: Explore the model’s behavior at each stage by visualizing the input, regridded, and final model output datasets.
Parent/Child Model Comparisons: Analyzing differences between the parent (coarse-resolution) and child (high-resolution) model outputs.
Validation with Observational Data: Comparing the simulation results with in-situ measurements or satellite datasets to validate accuracy.
Data Conversion: Transforming the output datasets into different formats for further analysis, external use, or integration with other tools.