Multi-sensor

SMARTIES learns unified representations of multi-sensor remote sensing (RS) images by leveraging spectrum-aware projections, enabling scalability and generalization to diverse RS sensors with unseen ones in a zero-shot manner.

✨ Key Features

SMARTIES stands out from existing foundation models on several aspects:

🛰️ Multi-Sensor Representations

Enable sensor-agnostic processing of RS data (optical, radar and VHR RGB imagery), including unseen ones in a zero-shot manner.

🌈 Spectrum-Aware Projections

Project data from heterogeneous sensors into a shared spectrum-aware space.

Lightweight and Scalable

Pretrain a simple yet effective model, demanding as little data as possible with similar complexity to MAE.

🔄 Downstream Transfer

Enable downstream transfer using a unified model across a diverse set of remote sensing sensors and tasks.

🔀 Flexible Band Combinations

Use of arbitrary combinations of spectral bands for downstream purposes, enabling flexible remote sensing applications.

Abstract

From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is crucial for achieving dense spatio-temporal monitoring of our planet. In contrast, recent deep learning models, whether task-specific or foundational, are often specific to single sensors or to fixed combinations: adapting such models to different sensory inputs requires both architectural changes and re-training, limiting scalability and generalization across multiple RS sensors. On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing. To address this, we introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SMARTIES projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the use of arbitrary combinations of bands both for training and inference. To obtain sensor-agnostic representations, we train a single, unified transformer model reconstructing masked multi-sensor data with cross-sensor token mixup. On both single- and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining.

📊 Downstream Transfer & Evaluation

Dataset Task Evaluation Type Sensor(s) Used
BigEarthNetS2 Multi-label Classification Fine-tuning Sentinel-2
BigEarthNetS1 Multi-label Classification Linear Probing Sentinel-1
BigEarthNetMM Multi-Modal Multi-label Classification Linear Probing Sentinel-1, Sentinel-2
EuroSAT Scene Classification Fine-tuning Sentinel-2
EuroSAT Scene Classification Linear Probing Sentinel-2
EuroSAT Scene Classification kNN Sentinel-2
RESISC-45 Scene Classification Fine-tuning RGB
WHU-RS19 Scene Classification kNN RGB
UC-Merced Scene Classification kNN RGB
BurnScars Semantic Segmentation UPerNet Probing HLS
DynamicEarthNet Semantic Segmentation UPerNet Probing Planet
SpaceNet7 Semantic Segmentation UPerNet Probing Planet
SICKLE Semantic Segmentation Non-linear Probing Landsat-8 (OLI, TIRS)
DFC2020 Multi-Modal Semantic Segmentation Non-linear Probing Sentinel-1, Sentinel-2

BibTeX

@article{smarties,
  title={{SMARTIES}: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images},
  author={Gencer Sumbul and Chang Xu and Emanuele Dalsasso and Devis Tuia},
  journal={arXiv preprint arXiv:2506.19585},
  year={2025}
}