Developer Guide

This guide provides an overview of OPS internals for developers who wish to contribute to OPS, add new backends, or understand how the code generation and runtime library work.

Architecture Overview

OPS consists of two main components:

  1. Code Generator (ops_translator/): A Python-based source-to-source translator that parses user applications (using libclang for C++ and fparser2 for Fortran) and generates parallel code for various backends.

  2. Runtime Library (ops/c/ and ops/fortran/): Backend-specific implementations that handle data management, parallelization, and communication.

┌─────────────────────────────────────────────────────────────────────┐
│                         User Application                            │
│                    (ops_par_loop calls + kernels)                   │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     Code Generator (ops_translator)                 │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐  │
│  │   Parser    │───>│   Scheme    │───>│   Jinja2 Templates      │  │
│  │ (libclang/  │    │  (target    │    │  (loop_host, master_    │  │
│  │  fparser2)  │    │   logic)    │    │   kernel, etc.)         │  │
│  └─────────────┘    └─────────────┘    └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       Generated Parallel Code                       │
│      (CUDA, HIP, SYCL, OpenMP, OpenMP Offload + MPI variants)       │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      Runtime Library (ops/c/src/)                   │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
│  │  core/   │  │  cuda/   │  │  sycl/   │  │   mpi/   │    ...      │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘             │
└─────────────────────────────────────────────────────────────────────┘

Code Generator (ops_translator)

The code generator is located in ops_translator/ops-translator/ and uses Python with Clang bindings (libclang) for C++ parsing and fparser2 for Fortran parsing.

Directory Structure

ops_translator/
├── ops-translator/          # Main translator code
│   ├── __main__.py          # Entry point & CLI argument handling
│   ├── scheme.py            # Code generation schemes (genLoopHost)
│   ├── target.py            # Target definitions (Cuda, Sycl, Hip, etc.)
│   ├── ops.py               # OPS constructs (Loop, Arg, Dat, etc.)
│   ├── store.py             # Application, Program, ParseError classes
│   ├── util.py              # Utilities, KernelProcess class
│   ├── language.py          # Language definitions (C++, Fortran)
│   ├── jinja_utils.py       # Jinja2 environment setup
│   ├── cpp/                 # C++ specific code
│   │   ├── parser.py        # Clang-based C++ parser
│   │   ├── schemes.py       # C++ target scheme implementations
│   │   └── translator/      # Kernel/program translators
│   └── fortran/             # Fortran specific code
├── resources/               # Code generation resources
│   └── templates/           # Jinja2 templates
│       ├── cpp/             # C++ templates
│       │   ├── loop_host.cpp.j2      # Base loop host template
│       │   ├── master_kernel.cpp.j2  # Master kernel file
│       │   ├── cuda/                 # CUDA-specific templates
│       │   ├── sycl/                 # SYCL-specific templates
│       │   ├── mpi_openmp/           # MPI+OpenMP templates
│       │   └── ...
│       └── fortran/         # Fortran templates
└── .python/                 # Python virtual environment for Makefile builds (generated by `make python`, not in version control)
                             # CMake builds create ops_venv/ under ${CMAKE_INSTALL_PREFIX}/translator/ops_translator/ instead

Key Classes

Target (target.py)

Defines code generation targets and their configurations:

class Target(Findable):
    name: str                    # Target identifier (e.g., "cuda", "sycl")
    kernel_translation: bool     # Whether kernel code needs transformation
    config: Dict[str, Any]       # Target-specific configuration

Available targets:

Target Class

Name

Description

MPIOpenMP

mpi_openmp

CPU sequential/OpenMP

Cuda

cuda

NVIDIA GPUs via CUDA

Hip

hip

AMD GPUs via HIP

Sycl

sycl

Intel/AMD/NVIDIA via SYCL

OpenMPOffload

openmp_offload

GPU via OpenMP target

F2CCuda

f2c_cuda

Fortran-to-C CUDA

F2CHip

f2c_hip

Fortran-to-C HIP

F2CSycl

f2c_sycl

Fortran-to-C SYCL

Scheme (scheme.py)

Orchestrates code generation for a language/target combination:

class Scheme(Findable):
    lang: Lang                   # Language (C++, Fortran)
    target: Target               # Target backend
    loop_host_template: Path     # Template for loop host code
    
    def genLoopHost(...) -> Tuple[str, str, str]:
        """Generate loop host code from template"""
        # 1. Translate kernel if needed
        # 2. Process kernel text (KernelProcess)
        # 3. Render Jinja2 template
        return (generated_code, extension, kernel_func)

KernelProcess (util.py)

Handles kernel text transformations for different backends:

class KernelProcess:
    def clean_kernel_func_text(kernel_func)     # Remove OPS-specific markers
    def cuda_complex_numbers(kernel_func)       # Handle complex number support
    def sycl_kernel_func_text(kernel_func, consts)  # SYCL-specific transforms
    def get_kernel_body_and_arg_list(kernel_func)   # Extract body and args

Parser (cpp/parser.py)

Uses libclang to parse C++ source files:

def parseLoops(translation_unit, program) -> None:
    """Parse ops_par_loop calls from C++ source"""
    # Find macro instantiations and function calls
    # Extract loop information (kernel, block, range, arguments)

Jinja2 Templates

Templates use Jinja2 syntax with OPS-specific filters and tests. Key template variables:

Variable

Description

lh

Loop host object (kernel name, args, ndim, etc.)

kernel_func

Original kernel function text

kernel_body

Extracted kernel body

args_list

Argument name list

target

Current target object

consts_in_kernel

Constants used in kernel

Example template structure (loop_host.cpp.j2):

{% block host_prologue %}
    // Setup code: args, dimensions, pointers
{% endblock %}

{% block kernel_call %}
    // Parallel launch code (varies by target)
{% endblock %}

{% block host_epilogue %}
    // Cleanup, timing, diagnostics
{% endblock %}

Adding a New Backend

To add a new backend (e.g., “newgpu”):

  1. Define Target in target.py:

class NewGPU(Target):
    name = "newgpu"
    kernel_translation = True
    config = {"grouped": True, "device": 11}

Target.register(NewGPU)
  1. Create Scheme in cpp/schemes.py:

class CppNewGPU(CppScheme):
    target = NewGPU()
    loop_host_template = Path("cpp/newgpu/loop_host.cpp.j2")
    master_kernel_template = Path("cpp/newgpu/master_kernel.cpp.j2")
    loop_kernel_extension = "newgpu.cpp"

Scheme.register(CppNewGPU)
  1. Create Templates in resources/templates/cpp/newgpu/:

    • loop_host.cpp.j2 - Loop host wrapper

    • master_kernel.cpp.j2 - Master include file

  2. Add Runtime Support in ops/c/src/newgpu/ (if needed)

  3. Update Makefiles in makefiles/ directory


Runtime Library (ops/c/)

The runtime library provides backend implementations for data management, parallel execution, and communication.

Directory Structure

ops/c/
├── include/                 # Public headers
│   ├── ops_lib_core.h       # Core OPS API
│   ├── ops_seq.h            # Sequential backend header
│   ├── ops_cuda.h           # CUDA backend header
│   ├── ops_hip.h            # HIP backend header
│   ├── ops_sycl.h           # SYCL backend header
│   └── ...
├── src/                     # Implementation
│   ├── core/                # Core library (shared across backends)
│   │   ├── ops_lib_core.cpp # Core API implementation
│   │   ├── ops_lazy.cpp     # Lazy execution & tiling
│   │   └── ops_instance.cpp # OPS instance management
│   ├── sequential/          # Sequential backend
│   ├── cuda/                # CUDA backend
│   ├── hip/                 # HIP backend
│   ├── sycl/                # SYCL backend
│   ├── mpi/                 # MPI support for all backends
│   │   ├── ops_mpi_core.cpp
│   │   ├── ops_mpi_partition.cpp  # Domain decomposition
│   │   ├── ops_mpi_rt_support_cuda.cpp
│   │   ├── ops_mpi_rt_support_sycl.cpp
│   │   └── ...
│   ├── ompoffload/          # OpenMP offload backend
│   └── tridiag/             # Tridiagonal solver support
└── lib/                     # Compiled libraries

Core Components

ops_lib_core.cpp

  • ops_init() / ops_exit() - Initialization and cleanup

  • ops_decl_block() - Block declaration

  • ops_decl_dat() - Dataset declaration

  • ops_decl_stencil() - Stencil declaration

  • ops_partition() - MPI partitioning trigger

ops_lazy.cpp

  • Lazy execution queue management

  • Tiling plan computation

  • Communication-avoiding optimizations

  • Key structures: ops_kernel_list, tiling_plan

MPI Support (ops/c/src/mpi/)

  • Domain decomposition (ops_mpi_partition.cpp)

  • Halo exchange management

  • Backend-specific MPI+GPU support:

    • ops_mpi_rt_support_cuda.cpp - CUDA+MPI

    • ops_mpi_rt_support_sycl.cpp - SYCL+MPI

    • ops_mpi_rt_support_hip.cpp - HIP+MPI

Adding Runtime Support for a New Backend

  1. Create backend directory: ops/c/src/newgpu/

  2. Implement required functions:

    • Device memory allocation/deallocation

    • Data transfer (host ↔ device)

    • Kernel launch wrappers

  3. Add MPI support (if needed): ops/c/src/mpi/ops_mpi_rt_support_newgpu.cpp

  4. Update build system:

    • Add makefiles/Makefile.newgpu

    • Update CMakeLists.txt


Build System

OPS supports two build systems: CMake (recommended) and Makefiles. Both produce the same set of backend libraries and application binaries. For full build instructions see installation.md.

CMake Build System

The top-level CMakeLists.txt orchestrates the entire build: compiler detection, dependency discovery, backend library compilation, translator installation, and optional application builds.

Key CMake Options

Option

Default

Description

CMAKE_BUILD_TYPE

Build type: Release, Debug, or empty for default

BUILD_OPS_CXX

ON

Build the C/C++ backend libraries

BUILD_OPS_FORTRAN

OFF

Build the Fortran backend libraries

BUILD_OPS_APPS

OFF

Build sample applications (library CMake only)

OPS_TEST

OFF

Enable CTest-based tests

OPS_HIP

OFF

Enable the HIP backend

LEGACY_CODEGEN

OFF

Use the legacy code generator

ENABLE_IEEE

OFF

Enable strict IEEE floating-point flags

OPS_VERBOSE_WARNING

OFF

Show verbose output during build

CMAKE_INSTALL_PREFIX

/usr/local

Library installation directory

APP_INSTALL_DIR

$HOME/OPS-APPS

Application installation directory

OPS_INSTALL_DIR

Path to installed OPS library (app CMake only)

GPU_NUMBER

Number of GPUs for tests

GPU_ARCH

70

CUDA compute capability (e.g., 80 for A100)

LIBTRID_PATH

Path to tridiagonal solver library (optional)

Dependency Detection

CMake automatically discovers: MPI (find_package(MPI)), HDF5 (find_package(HDF5)), CUDA (find_package(CUDAToolkit)), OpenMP (find_package(OpenMP)), HIP (when OPS_HIP=ON), and Python 3.8+. The translator’s Python virtual environment is set up differently depending on the build system. For the CMake build, ops_translator/CMakeLists.txt copies the translator tree to ${CMAKE_INSTALL_PREFIX}/translator/ops_translator/ and runs setup_venv_cmake.sh directly (using python3 -m venv) to create the venv at ${CMAKE_INSTALL_PREFIX}/translator/ops_translator/ops_venv/ — it does not call make python. For the Makefile build, make python inside ops_translator/ creates the venv under ops_translator/.python/.

Build Structure

CMakeLists.txt                  # Top-level: compiler flags, dependencies, options
├── ops_translator/CMakeLists.txt   # Installs translator + sets up Python venv
├── ops/c/CMakeLists.txt            # Backend libraries (ops_seq, ops_cuda, ops_mpi, etc.)
├── ops/fortran/CMakeLists.txt      # Fortran backend libraries
├── apps/c/CMakeLists.txt           # C/C++ example applications
│   ├── CloverLeaf/CMakeLists.txt
│   ├── shsgc/CMakeLists.txt
│   └── ...
└── apps/fortran/CMakeLists.txt     # Fortran example applications

Library Targets

The CMake build in ops/c/CMakeLists.txt produces these library targets:

CMake Target

Condition

Description

ops_seq

Always

Sequential + OpenMP

ops_cuda

CUDAToolkit_FOUND

CUDA single-node

ops_hip

OPS_HIP + HIP_FOUND

HIP single-node

ops_ompoffload

NVHPC compiler + CUDA

OpenMP Offload

ops_mpi

MPI_FOUND

MPI + sequential

ops_mpi_cuda

MPI + CUDA

MPI + CUDA

ops_mpi_hip

MPI + HIP

MPI + HIP

ops_hdf5_seq

HDF5_FOUND

HDF5 I/O (sequential)

ops_hdf5_mpi

HDF5 + MPI

HDF5 I/O (MPI)

All libraries are installed under ${CMAKE_INSTALL_PREFIX}/lib with CMake export files at ${CMAKE_INSTALL_PREFIX}/lib/cmake, allowing downstream projects to use find_package(OPS).

Typical Build Workflow

# Build everything (library + apps)
mkdir build && cd build
cmake .. -DBUILD_OPS_APPS=ON -DCMAKE_INSTALL_PREFIX=$HOME/OPS-INSTALL \
         -DAPP_INSTALL_DIR=$HOME/OPS-APPS -DGPU_NUMBER=1
make
make install

# Or build library and apps separately
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/OPS-INSTALL
make && make install

mkdir appbuild && cd appbuild
cmake ../../apps/c -DOPS_INSTALL_DIR=$HOME/OPS-INSTALL -DAPP_INSTALL_DIR=$HOME/OPS-APPS
make

Adding a New Backend to CMake

  1. Add a new library target in ops/c/CMakeLists.txt (following the pattern of existing backends)

  2. Add conditional detection logic in the top-level CMakeLists.txt if a new dependency is needed

  3. Use installtarget() macro to register the target for installation and export

  4. Add MPI variant if applicable (create ops_mpi_<backend> target)

Makefile System

The Makefile-based build uses modular includes:

makefiles/
├── Makefile.common          # Common flags and definitions
├── Makefile.c_app           # Main C application makefile
├── Makefile.cuda            # CUDA-specific flags
├── Makefile.hip             # HIP-specific flags
├── Makefile.sycl            # SYCL flags (via Makefile.icx)
├── Makefile.mpi             # MPI flags
└── Makefile.<compiler>      # Compiler-specific settings

Build Targets

For an application named APP, the following targets are generated:

Target

Description

$(APP)_dev_seq

Development sequential (no code-gen)

$(APP)_dev_mpi

Development MPI (no code-gen)

$(APP)_seq

Sequential with generated kernels

$(APP)_openmp

OpenMP parallel

$(APP)_mpi

MPI distributed

$(APP)_mpi_openmp

MPI + OpenMP hybrid

$(APP)_tiled

Lazy execution with tiling

$(APP)_cuda

CUDA single GPU

$(APP)_mpi_cuda

MPI + CUDA

$(APP)_sycl

SYCL single device

$(APP)_mpi_sycl

MPI + SYCL

$(APP)_hip

HIP single GPU

$(APP)_mpi_hip

MPI + HIP

$(APP)_ompoffload

OpenMP Offload single GPU

$(APP)_mpi_ompoffload

MPI + OpenMP Offload


Debugging Tips

Code Generator Debugging

# Verbose output
python3 ops-translator -v --file_paths source.cpp

# Dump parsed structure as JSON
python3 ops-translator -d --file_paths source.cpp

# Target specific backend only
python3 ops-translator -t cuda --file_paths source.cpp

Runtime Debugging

# Enable diagnostics
./app_cuda -OPS_DIAGS=2

# Check block decomposition (MPI)
./app_mpi_cuda -OPS_DIAGS=2

# Timing breakdown
ops_timing_output(stdout);

Common Issues

Issue

Cause

Solution

GET_MACRO redefined

Name collision with Intel headers

Harmless warning, ignore

printf in SYCL kernel

Variadic functions not allowed

Guard with #ifndef OPS_SYCL

Preprocessor directives stripped

Code generator limitation

Use runtime conditionals


Contributing

To contribute to OPS, please use the following steps:

  1. Clone the OPS repository on your local system.

  2. Create a new branch in your cloned repository.

  3. Make changes or contributions in your new branch.

  4. Submit your changes by creating a pull request to the develop branch of the OPS repository.

Contributions in the develop branch will be merged into the master branch when a new release is created.