SMARTIES

✨ Key Features

SMARTIES stands out from existing foundation models on several aspects:

🛰️ Multi-Sensor Representations

Enable sensor-agnostic processing of RS data (optical, radar and VHR RGB imagery), including unseen ones in a zero-shot manner.

🌈 Spectrum-Aware Projections

Project data from heterogeneous sensors into a shared spectrum-aware space.

⚡ Lightweight and Scalable

Pretrain a simple yet effective model, demanding as little data as possible with similar complexity to MAE.

🔄 Downstream Transfer

Enable downstream transfer using a unified model across a diverse set of remote sensing sensors and tasks.

🔀 Flexible Band Combinations

Use of arbitrary combinations of spectral bands for downstream purposes, enabling flexible remote sensing applications.

Abstract

From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is crucial for achieving dense spatio-temporal monitoring of our planet. In contrast, recent deep learning models, whether task-specific or foundational, are often specific to single sensors or to fixed combinations: adapting such models to different sensory inputs requires both architectural changes and re-training, limiting scalability and generalization across multiple RS sensors. On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing. To address this, we introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SMARTIES projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the use of arbitrary combinations of bands both for training and inference. To obtain sensor-agnostic representations, we train a single, unified transformer model reconstructing masked multi-sensor data with cross-sensor token mixup. On both single- and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining.

📊 Downstream Transfer & Evaluation

Dataset	Task	Evaluation Type	Sensor(s) Used
BigEarthNetS2	Multi-label Classification	Fine-tuning	Sentinel-2
BigEarthNetS1	Multi-label Classification	Linear Probing	Sentinel-1
BigEarthNetMM	Multi-Modal Multi-label Classification	Linear Probing	Sentinel-1, Sentinel-2
EuroSAT	Scene Classification	Fine-tuning	Sentinel-2
EuroSAT	Scene Classification	Linear Probing	Sentinel-2
EuroSAT	Scene Classification	kNN	Sentinel-2
RESISC-45	Scene Classification	Fine-tuning	RGB
WHU-RS19	Scene Classification	kNN	RGB
UC-Merced	Scene Classification	kNN	RGB
BurnScars	Semantic Segmentation	UPerNet Probing	HLS
DynamicEarthNet	Semantic Segmentation	UPerNet Probing	Planet
SpaceNet7	Semantic Segmentation	UPerNet Probing	Planet
SICKLE	Semantic Segmentation	Non-linear Probing	Landsat-8 (OLI, TIRS)
DFC2020	Multi-Modal Semantic Segmentation	Non-linear Probing	Sentinel-1, Sentinel-2

Dataset

Task

Evaluation Type

Sensor(s) Used

BigEarthNetS2

Multi-label Classification

Fine-tuning

Sentinel-2

BigEarthNetS1

Multi-label Classification

Linear Probing

Sentinel-1

BigEarthNetMM

Multi-Modal Multi-label Classification

Linear Probing

Sentinel-1, Sentinel-2

EuroSAT

Scene Classification

Fine-tuning

Sentinel-2

EuroSAT

Scene Classification

Linear Probing

Sentinel-2

EuroSAT

Scene Classification

kNN

Sentinel-2

RESISC-45

Scene Classification

Fine-tuning

RGB

WHU-RS19

Scene Classification

kNN

RGB

UC-Merced

Scene Classification

kNN

RGB

BurnScars

Semantic Segmentation

UPerNet Probing

HLS

DynamicEarthNet

Semantic Segmentation

UPerNet Probing

Planet

SpaceNet7

Semantic Segmentation

UPerNet Probing

Planet

SICKLE

Semantic Segmentation

Non-linear Probing

Landsat-8 (OLI, TIRS)

DFC2020

Multi-Modal Semantic Segmentation

Non-linear Probing

Sentinel-1, Sentinel-2

@article{smarties, title={{SMARTIES}: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images}, author={Gencer Sumbul and Chang Xu and Emanuele Dalsasso and Devis Tuia}, journal={arXiv preprint arXiv:2506.19585}, year={2025} }

Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

International Conference on Computer Vision (ICCV) 2025

SMARTIES learns unified representations of multi-sensor remote sensing (RS) images by leveraging spectrum-aware projections, enabling scalability and generalization to diverse RS sensors with unseen ones in a zero-shot manner.

✨ Key Features

🛰️ Multi-Sensor Representations

🌈 Spectrum-Aware Projections

⚡ Lightweight and Scalable

🔄 Downstream Transfer

🔀 Flexible Band Combinations

Abstract

SMARTIES lifts sensor-dependent efforts for multi-sensor RS image representation learning by leveraging: (1) spectrum-aware RS image projection; (2) cross-sensor token mixup; and (3) spectrum-aware RS image reconstruction.

Spectrum-aware RS image projection and reconstruction illustrated on a pair of SAR and multispectral patches.

An example of downstream transfer to an unseen spectral band through interpolation. \(\lambda^c_{10}\) and \(\lambda^{c}_{11}\) denote the centre wavelength of the NIR and SWIR bands seen during pretraining; \(\lambda^{c}_{n}\) denotes the centre wavelength of a new, unseen spectral band.

📊 Downstream Transfer & Evaluation

BibTeX