Understanding user intent is essential for situational and context-aware decision-making. Motivated by a real-world scenario, this work addresses intent predictions of smart device users in the vicinity of vehicles by modeling sequential spatiotemporal data. However, in real-world scenarios, environmental factors and sensor limitations can result in non-stationary and irregularly sampled data, posing significant challenges. To address these issues, we propose STaRFormer, a Transformer-based approach that can serve as a universal framework for sequential modeling. STaRFormer utilizes a new dynamic attention-based regional masking scheme combined with a novel semi-supervised contrastive learning paradigm to enhance task-specific latent representations. Comprehensive experiments on 56 datasets varying in types (including non-stationary and irregularly sampled), domains, sequence lengths, training samples, and applications demonstrate the efficacy of STaRFormer. We achieve notable improvements over state-of-the-art approaches.
This work aims to address the challenges posed by real-world time series data, which often exhibit non-stationarity and irregular sampling characteristics due to factors such as sensor technology, external conditions, and device malfunctions. Conventional machine learning algorithms, such as LSTM and Transformer, typically assume the data is fully observed, stationary, and sampled at regular intervals. We developed a versatile framework, STaRFormer, that can effectively model time series with these characteristics while maintaining applicability to regular time series data as well.
A demonstration of the use case is available here.
The proposed STaRFormer framework introduces dynamic attention-based regional masking and a novel semi-supervised contrastive learning scheme to create robust task-informed latent embeddings, enhancing the model's robustness to irregularities in time series. This approach can additional serves as an effective augmentation method to improve performance for various time series types (including non-stationary and irregularly sampled), domains and downstream tasks.
I. Formulation for sequence-level prediction tasks
Composition of batch-wise (bw) and class-wise (cw) contrastive components.
II. Formulation for elementwise-level prediction tasks
Composition of batch-wise (bw), intra-class-wise (cw-intra) and inter-class-wise (cw-inter) contrastive components.
DKT | GL | ||
---|---|---|---|
Accuracy↑ | F0.5-Score↑ | Accuracy↑ | |
RNN | 0.754 ± 0.010 | 0.754 ± 0.010 | 0.643 |
TrajFormer | - | - | 0.855 |
SVM | - | - | 0.861 |
LSTM | 0.844 ± 0.003 | 0.843 ± 0.002 | 0.884 |
GRU | 0.840 ± 0.003 | 0.840 ± 0.003 | 0.898 |
ST-GRU | - | - | 0.913 |
Transformer | 0.849 ± 0.002 | 0.849 ± 0.002 | 0.881 |
TARNet | 0.781 ± 0.011 | 0.782 ± 0.012 | 0.880 |
TimesURL | 0.724 ± 0.003 | - | 0.751 |
STaRFormer | 0.852± 0.003 | 0.852± 0.003 | 0.932 |
P19 | P12 | PAM | ||||||
---|---|---|---|---|---|---|---|---|
AUROC↑ | AUPRC↑ | AUROC↑ | AUPRC↑ | Accuracy↑ | Precision↑ | Recall↑ | F1-Score↑ | |
Transformer | 80.7 ± 3.8 | 42.7 ± 7.7 | 83.3 ± 0.7 | 47.9 ± 3.6 | 83.5 ± 1.5 | 84.8 ± 1.5 | 86.0 ± 1.2 | 85.0 ± 1.3 |
Trans-mean | 83.7 ± 1.8 | 45.8 ± 3.2 | 82.6 ± 2.0 | 46.3 ± 4.0 | 83.7 ± 2.3 | 84.9 ± 2.6 | 86.4 ± 2.1 | 85.1 ± 2.4 |
GRU-D | 83.9 ± 1.7 | 46.9 ± 2.1 | 81.9 ± 2.1 | 46.1 ± 4.7 | 83.3 ± 1.6 | 84.6 ± 1.2 | 85.2 ± 1.6 | 84.8 ± 1.2 |
SeFT | 81.2 ± 2.3 | 41.9 ± 3.1 | 73.9 ± 2.5 | 31.1 ± 4.1 | 67.1 ± 2.2 | 70.0 ± 2.4 | 68.2 ± 1.5 | 68.5 ± 1.8 |
mTAND | 84.4 ± 1.3 | 50.6 ± 2.0 | 84.2 ± 0.8 | 48.2 ± 3.4 | 74.6 ± 4.3 | 74.3 ± 4.0 | 79.5 ± 2.8 | 76.8 ± 3.4 |
IP-Net | 84.6 ± 1.3 | 38.1 ± 3.7 | 82.6 ± 1.4 | 47.6 ± 3.1 | 74.3 ± 3.8 | 75.6 ± 2.1 | 77.9 ± 2.2 | 76.6 ± 2.8 |
DGM$^2$-O | 86.7 ± 3.4 | 44.7 ± 11.7 | 84.4 ± 1.6 | 47.3 ± 3.6$ | 82.4 ± 2.3 | 85.2 ± 1.2 | 83.9 ± 2.3 | 84.3 ± 1.8 |
MTGNN | 81.9 ± 6.2 | 39.9 ± 8.9 | 74.4 ± 6.7 | 35.5 ± 6.0 | 83.4 ± 1.9 | 85.2 ± 1.7 | 86.1 ± 1.9 | 85.9 ± 2.4 |
Raindrop | 87.0 ± 2.3 | 51.8 ± 5.5 | 82.8 ± 1.7 | 44.0 ± 3.0 | 88.5 ± 1.5 | 89.9 ± 1.5 | 89.9 ± 0.6 | 89.8 ± 1.0 |
ViTST | 89.2 ± 2.0 | 53.1 ± 3.4 | 85.1 ± 0.8 | 51.1 ± 4.1 | 95.8 ± 1.3 | 96.2 ± 1.3 | 96.5 ± 1.2 | 96.1 ± 1.1 |
STaRFormer | 89.4± 1.3 | 61.3± 3.4 | 85.3± 1.2 | 52.0± 1.7 | 97.6± 0.9 | 97.3± 0.4 | 97.6± 0.3 | 97.4± 0.3 |
ViTST | DTWD | Weasel-Muse | TST (TimesURL) | T-Loss | TS-TCC | TNC | TS2Vec | InfoTS | Rocket | Mini-Rocket | TST (TARNet) | InfoTSs | TimesURL | TARNet | STaRFormer | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg. Accuracy↑ | 0.790 | 0.608 | 0.691 | 0.617 | 0.658 | 0.668 | 0.670 | 0.704 | 0.714 | 0.715 | 0.719 | 0.729 | 0.730 | 0.752 | 0.755 | 0.795 |
Rank↓ | - | - | - | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
Avg. Rank↓ | - | - | - | 10.6 | 8.6 | 9.2 | 9.9 | 7.4 | 6.8 | 5.5 | 5.7 | 6.5 | 5.3 | 3.9 | 4.9 | 2.8 |
Top scores↑ | 1 | 0 | 5 | 1 | 1 | 1 | 0 | 1 | 1 | 5 | 4 | 6 | 3 | 4 | 7 | 9 |
1-v-1↑ | 8 | 28 | 20 | 29 | 27 | 27 | 29 | 25 | 27 | 19 | 22 | 23 | 23 | 19 | 21 | - |
DS Count↑ | 10 | 29 | 28 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
Avg. Accuracy 28↑ | - | 0.604 | 0.691 | 0.631 | 0.675 | 0.680 | 0.677 | 0.713 | 0.722 | 0.730 | 0.733 | 0.724 | 0.738 | 0.760 | 0.770 | 0.793 |
Rank 28↓ | - | 15 | 10 | 14 | 13 | 11 | 12 | 9 | 8 | 6 | 5 | 7 | 4 | 3 | 2 | 1 |
Avg. Rank 28↓ | - | 11.2 | 7.8 | 11.7 | 9.1 | 10.3 | 11.0 | 8.1 | 7.5 | 5.8 | 6.0 | 7.5 | 5.8 | 4.1 | 5.2 | 3.1 |
Avg. Accuracy 9↑ | 0.776 | 0.702 | 0.737 | 0.674 | 0.717 | 0.708 | 0.715 | 0.734 | 0.727 | 0.756 | 0.751 | 0.771 | 0.736 | 0.770 | 0.717 | 0.793 |
Rank 9↓ | 2 | 15 | 7 | 16 | 12 | 14 | 13 | 9 | 10 | 5 | 6 | 3 | 8 | 4 | 11 | 1 |
Avg. Rank 9↓ | 6.4 | 11.8 | 9.0 | 12.3 | 11.1 | 11.3 | 10.8 | 9.0 | 10.0 | 6.7 | 7.4 | 3.9 | 8.4 | 5.3 | 6.3 | 3.3 |
Yahoo | KPI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
F1-Score↑ | Precision↑ | Recall↑ | F1-Score↑ | Precision↑ | Recall↑ | |||||||||||
SPOT | 0.338 | 0.269 | 0.454 | 0.217 | 0.786 | 0.126 | ||||||||||
DSPOT | 0.316 | 0.241 | 0.458 | 0.521 | 0.623 | 0.447 | ||||||||||
SPOT | 0.338 | 0.269 | 0.454 | 0.217 | 0.786 | 0.126 | ||||||||||
DONUT | 0.026 | 0.013 | 0.825 | 0.347 | 0.371 | 0.326 | ||||||||||
SR | 0.563 | 0.451 | 0.747 | 0.622 | 0.647 | 0.598 | ||||||||||
TS2Vec | 0.745 | 0.729 | 0.762 | 0.677 | 0.929 | 0.533 | ||||||||||
TimesURL | 0.749 | 0.748 | 0.750 | 0.688 | 0.925 | 0.546 | ||||||||||
STaRFormer | 0.789 | 0.772 | 0.807 | 0.830 | 0.852 | 0.811 | ||||||||||
FPCR | FPCR-Bspline | SVR | SVR Optimised | Random Forest | XGBoost | 1-NN-ED | 5-NN-ED | 1-NN-DTWD | 5-NN-DTWD | Rocket | FCN | ResNet | Inception | TARNet | STaRFormer | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg. Rel. Mean Difference ↓ | 0.028 | 0.029 | 0.387 | 0.208 | -0.121 | -0.132 | 0.288 | 0.051 | 0.125 | -0.034 | -0.245 | -0.160 | -0.119 | -0.220 | 0.170 | -0.254 |
Avg. Rel. Mean Rank ↓ | 9 | 10 | 16 | 14 | 6 | 5 | 15 | 11 | 12 | 8 | 2 | 4 | 7 | 3 | 13 | 1 |
Top Scores ↑ | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 6 | 1 | 1 | 3 | 0 | 9 |
t-SNE visualizations of STaRFormer's latent space Z. For datasets where our contrastive learning approach is highly effective (e.g., PS), more distinct class clusters are clearly visible.
@misc{2504.10097,
Author = {Maximilian Forstenhäusler and Daniel Külzer and Christos Anagnostopoulos and Shameem Puthiya Parambath and Natascha Weber},
Title = {STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data},
Year = {2025},
Eprint = {arXiv:2504.10097},
}