gap-filling gridded data#

This project worked on getting 2 different approaches to gap-filling of gridded data working: UNet gap-filling model [] developed by Yifei Hang during summer 2024 and DINCAE (Data-Interpolating Convolutional Auto-Encoder) [Barth et al., 2020, Barth et al., 2022].

We were successful in getting a tutorial the UNet model working but not DINCAE.

The basic approach is the following:

graph LR
  A[netcdf/Zarr w time, lat, lon] --> G{to xarray}
  G --> C[standardized Zarr w masks and season]
  C --> D{UNet model}
  D --> E[Predict: xarray with gaps filled]

Collaborators#

Name

Role

Eli Holmes

Project Facilitator

Bruna Cândido

Fellow

Trina Xavier

Participant

Lilac Hong

Participant

Planning#

Background#

Chlorophyll is a widely used indicator of plankton abundance, and thus a key measure of marine productivity and ecosystem health, since the ocean covers nearly 70% of Earth’s surface. Estimating chlorophyll concentrations allows researchers to assess phytoplankton biomass, which supports oceanic food webs and contributes to global carbon cycling. Remote sensing with ocean-color instruments enables large-scale monitoring of chlorophyll-a by detecting the light reflectance of plankton. However, cloud cover continues to be a significant challenge, obstructing surface observations and creating gaps in chlorophyll-a data. These gaps limit our ability to monitor marine productivity accurately and to quantify the contribution of plankton to the global carbon cycle.

Goals#

Contribute to “mind-the-chl-gap” project and the create a tutorial on gap-free Indian Ocean gridded data with U-Net method. For OceanHackWeek 2025, we aimed to extend the existing work by exploring different types of CNN architectures and experimenting with alternative gap-filling tools, such as segmentation_models_pytorch, DINCAE.

Datasets#

import xarray as xr
dataset = xr.open_dataset(
    "gcs://nmfs_odp_nwfsc/CB/mind_the_chl_gap/IO.zarr",
    engine="zarr",
    backend_kwargs={"storage_options": {"token": "anon"}},
    consolidated=True
)
dataset

Workflow/Roadmap#

flowchart TD
    A[Zarr data] --> B[Data Preprocessing]
    B --> C[Model Fit]
    C --> D[Result Visualization]

Results/Findings#

oceanhackweek.org/ohw25_proj_gap/

Functions are in mindthegap directory.

import mindthegap as mtg

Lessons Learned#

  • Working with outdated packages can be quite challenging.

  • Existing frameworks (e.g., DINCAE) can serve as inspiration but need to be adapted to the specific context.

  • Pay attention to memory efficiency — document how much memory is required to run your code and data.

  • Collaboration and thorough documentation help improve workflow efficiency.

  • Avoid using to_numpy() on the full dataset (time, lat, lon, var). Instead, stream patches directly from the Zarr files in batches or use dask.

  • Xarray is powerful, with advanced options available in icechunk and cubed.

References#

Creating the JupyterBook#

Create template in book directory

pip install -U jupyter-book
jupyter-book create book

Build and push to GitHub. Make sure you are in book dir.

jupyter-book build .
ghp-import -n -p -f _build/html