gap-filling gridded data#
This project worked on getting 2 different approaches to gap-filling of gridded data working: UNet gap-filling model [] developed by Yifei Hang during summer 2024 and DINCAE (Data-Interpolating Convolutional Auto-Encoder) [Barth et al., 2020, Barth et al., 2022].
We were successful in getting a tutorial the UNet model working but not DINCAE.
The basic approach is the following:
graph LR
A[netcdf/Zarr w time, lat, lon] --> G{to xarray}
G --> C[standardized Zarr w masks and season]
C --> D{UNet model}
D --> E[Predict: xarray with gaps filled]
Collaborators#
Name |
Role |
---|---|
Project Facilitator |
|
Fellow |
|
Participant |
|
Participant |
Planning#
Initial idea: Create a tutorial on gap-free Indian Ocean gridded data with U-Net method
Slack channel: ohw25_proj_gap
Background#
Chlorophyll is a widely used indicator of plankton abundance, and thus a key measure of marine productivity and ecosystem health, since the ocean covers nearly 70% of Earth’s surface. Estimating chlorophyll concentrations allows researchers to assess phytoplankton biomass, which supports oceanic food webs and contributes to global carbon cycling. Remote sensing with ocean-color instruments enables large-scale monitoring of chlorophyll-a by detecting the light reflectance of plankton. However, cloud cover continues to be a significant challenge, obstructing surface observations and creating gaps in chlorophyll-a data. These gaps limit our ability to monitor marine productivity accurately and to quantify the contribution of plankton to the global carbon cycle.
Goals#
Contribute to “mind-the-chl-gap” project and the create a tutorial on gap-free Indian Ocean gridded data with U-Net method. For OceanHackWeek 2025, we aimed to extend the existing work by exploring different types of CNN architectures and experimenting with alternative gap-filling tools, such as segmentation_models_pytorch, DINCAE.
Datasets#
import xarray as xr
dataset = xr.open_dataset(
"gcs://nmfs_odp_nwfsc/CB/mind_the_chl_gap/IO.zarr",
engine="zarr",
backend_kwargs={"storage_options": {"token": "anon"}},
consolidated=True
)
dataset
Workflow/Roadmap#
flowchart TD
A[Zarr data] --> B[Data Preprocessing]
B --> C[Model Fit]
C --> D[Result Visualization]
Results/Findings#
oceanhackweek.org/ohw25_proj_gap/
Functions are in mindthegap
directory.
import mindthegap as mtg
Lessons Learned#
Working with outdated packages can be quite challenging.
Existing frameworks (e.g., DINCAE) can serve as inspiration but need to be adapted to the specific context.
Pay attention to memory efficiency — document how much memory is required to run your code and data.
Collaboration and thorough documentation help improve workflow efficiency.
Avoid using
to_numpy()
on the full dataset (time, lat, lon, var). Instead, stream patches directly from the Zarr files in batches or use dask.Xarray is powerful, with advanced options available in icechunk and cubed.
References#
Creating the JupyterBook#
Create template in book
directory
pip install -U jupyter-book
jupyter-book create book
Build and push to GitHub. Make sure you are in book
dir.
jupyter-book build .
ghp-import -n -p -f _build/html