Smart Forecasting: Handling Missing Air Quality Data Using Bayesian Deep Learning

Researchers from Sun Yat-sen University developed CGLU-BNF, a Bayesian deep learning framework that accurately predicts air quality even with missing or noisy data by combining graph attention, Fourier features, and uncertainty quantification. Tested on London and Hong Kong datasets, it outperformed all existing models, delivering sharper, more reliable forecasts under diverse data loss conditions.


CoE-EDP, VisionRICoE-EDP, VisionRI | Updated: 06-11-2025 14:14 IST | Created: 06-11-2025 14:14 IST
Smart Forecasting: Handling Missing Air Quality Data Using Bayesian Deep Learning
Representative Image.

Researchers from the School of Intelligent Systems Engineering at Sun Yat-sen University in Shenzhen, along with the Guangdong Provincial Key Laboratory of Intelligent Transportation System and the Guangdong Provincial Engineering Research Center for Traffic Environmental Monitoring and Control, have pioneered a major leap in air quality forecasting. Their study, "Tackling Incomplete Data in Air Quality Prediction: A Bayesian Deep Learning Framework for Uncertainty Quantification," proposes an innovative system called the Channel Gated Learning Unit–Based Spatio-Temporal Bayesian Neural Field (CGLU-BNF). Designed by Yuzhuang Pian, Taiyu Wang, Shiqi Zhang, Rui Xu, and Yonghong Liu, the framework directly tackles the issue of missing and inconsistent air quality data, a common obstacle caused by sensor failures, uneven spatial coverage, and unstable data transmission that can lead to missing rates of up to 95%.

From Patchwork Solutions to Unified Learning

Traditionally, researchers handled missing data using a two-step process: first filling in the gaps through statistical or generative imputation, and then applying predictive models to the completed dataset. While methods using GANs, VAEs, or hybrid Transformer-GAN architectures produced promising results, they suffered from disjointed optimization and error propagation. End-to-end models, on the other hand, attempted to unify these steps but often ignored uncertainty or required immense computing power. Gaussian Process–based models provided more reliable uncertainty estimates but were slow and reliant on expert-designed kernels. The CGLU-BNF framework bridges these gaps by merging the interpretability of Bayesian inference with the expressive capacity of deep neural networks, allowing it to forecast directly from incomplete inputs while quantifying confidence in every prediction.

How CGLU-BNF Works

At its core, the model features three interlinked components. The first, the multilevel spatio-temporal encoder, extracts meaningful patterns using Fourier temporal harmonics, spatial embeddings, and a Graph Attention Network (GAT) to model relationships between monitoring stations. This setup allows the network to learn from irregular or sparse data while preserving long-range dependencies. The second, the Channel Gated Learning Unit, uses adaptive gating, residual connections, and learnable activations, a dynamic blend of ELU and Tanh functions, to suppress noise and highlight critical features. The third, a Bayesian inference layer, performs multi-particle optimization to generate both mean predictions and calibrated uncertainty intervals, ensuring the model remains transparent and statistically grounded.

Performance Across Diverse Conditions

To test its performance, the team used two large datasets, London (2018–2019) and Hong Kong (2023–2024), which represented different environmental conditions and degrees of data completeness. They simulated four types of missingness: random, node-level, timestamp-based, and spatio-temporal block missing, each at rates between 10% and 80%. The results were consistently impressive. In London, CGLU-BNF achieved an RMSE of 7.26 µg/m³ and MAE of 4.04 µg/m³, outperforming all competing models by about 7%. In Hong Kong, the RMSE dropped to 3.22 µg/m³ with a narrow uncertainty range, up to 19% sharper than that of the second-best Bayesian Neural Field model. Even with 30% of data missing, CGLU-BNF maintained stable results, showing only marginal performance loss compared with complete datasets. Its strength came from its ability to use spatial and temporal harmonics to infer missing data rather than simply guessing it.

Node-missing tests, where entire stations were offline, demonstrated the value of the GAT layer, which aggregated multi-hop spatial information from neighboring sensors to fill local gaps. Timestamp missing scenarios, where all stations lost data simultaneously, showed that the Fourier harmonics allowed the model to capture periodic cycles and restore temporal continuity. The toughest test, spatio-temporal block missingness, exposed other models' weaknesses but confirmed CGLU-BNF's resilience, as it combined global spatial trends with local station dependencies to reconstruct missing patterns effectively.

Insights, Impacts, and Future Directions

Ablation studies confirmed each module's contribution: removing spatial Fourier features doubled prediction intervals, while eliminating temporal harmonics increased errors by nearly 8%. The learnable activation proved superior to fixed nonlinearities, and channel attention enhanced robustness by dynamically adjusting feature importance. Forecast horizon tests showed the model's understanding of weekly cycles, errors peaked around seven days and declined as longer-term seasonality stabilized. The inclusion of external pollutants, such as PM₂.₅ and NO₂, improved predictions further; PM₂.₅ was the strongest companion variable for PM₁₀ due to shared emission sources, while NO₂ reflected traffic-related trends. Overall, CGLU-BNF outperformed all competitors not only in accuracy but also in providing tighter, more trustworthy uncertainty estimates.

The research, supported by the Fundamental Research Key Program of Shenzhen and the Guangzhou National Games Air Quality Enhancement Project, marks a shift toward intelligent, data-resilient environmental modeling. By enabling reliable predictions even with incomplete observations, the CGLU-BNF framework paves the way for more adaptive, risk-aware urban air quality management and holds promise for future applications in mobile and vehicle-based pollution monitoring networks.

TRENDING

DevShots

Latest News

OPINION / BLOG / INTERVIEW

From Green to Brown: How Rising Heat Is Turning the World’s Continents to Dust

A Global Blueprint for Adolescent Health: WHO Unveils Competency-Based Framework

Digitalisation Challenges Banks’ Confidence in Long-Term Deposit Stability, Says ECB

Smartphone Sensors and RFID Revolutionize Food Traceability with Real-Time Monitoring

Connect us on

LinkedIn Quora Youtube RSS
Give Feedback