Powerful AI–big data system predicts energy markets in real time
The research demonstrates significant economic and operational implications. In competitive electricity markets, precise forecasting drives more efficient bidding, risk management, and pricing strategies. The platform allows national grid operators to make data-driven decisions on energy distribution, congestion management, and generation scheduling.
A new study from the Instituto Nacional de Electricidad y Energías Limpias, Mexico, has unveiled an advanced AI–Big Data analytics platform that redefines how modern power grids handle energy forecasting, cost optimization, and operational decision-making. The system integrates artificial intelligence and machine learning into a data lake architecture, using open-source tools like Hadoop and Spark for scalable, cost-effective electricity forecasting.
The research, titled "AI–Big Data Analytics Platform for Energy Forecasting in Modern Power Systems," published in Big Data and Cognitive Computing, presents an on-premises, horizontally scalable analytics framework tailored for national energy operators. It enables real-time processing of vast electricity market data to forecast prices, detect anomalies, and plan grid operations with precision.
Building an intelligent foundation for power forecasting
The study analyses the digital transformation wave sweeping through global energy systems. Modern electricity markets generate immense volumes of heterogeneous data, sensor readings, transaction records, environmental inputs, and real-time price signals, that traditional analytics tools struggle to process. The authors propose a data lake-driven architecture as the backbone of a new analytics ecosystem, highlighting flexibility, scalability, and affordability.
This AI–Big Data platform integrates Hadoop for distributed storage, Hive for data querying, Spark for high-speed analytics, and Airflow for automation. Anaconda Navigator and Jupyter notebooks provide data science and model development environments. The platform supports structured, semi-structured, and unstructured data sources and allows parallel, distributed machine learning computations across multiple regional nodes.
By consolidating grid data into a unified data lake, the infrastructure supports advanced predictive analytics tasks like electricity price forecasting, demand prediction, and fault detection, enabling decision-makers to act on near-real-time intelligence rather than static reports.
How AI models learn to predict electricity prices
The research team implemented a comparative study of statistical, machine learning, and deep learning models to forecast electricity prices in Mexico's day-ahead market. They evaluated nine forecasting models: two statistical (ARIMA, BATS), six machine learning (Random Forest, Gradient Boosting, LightGBM, XGBoost, Support Vector Regression, Artificial Neural Network), and one deep learning model (Long Short-Term Memory, LSTM).
Each model was trained using 35,000 hourly historical records per grid node collected between 2021 and 2024, covering variables like temperature, fuel prices, congestion, and transmission losses. Model tuning employed grid search cross-validation and optimization using the Adam optimizer for neural networks. The study used Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and R² as evaluation metrics.
Results revealed that Support Vector Regression (SVR) and XGBoost (XGB) consistently outperformed other models, achieving MAPE values between 1% and 4%** across different forecasting horizons. While deep learning approaches like LSTM showed promise, they required more diverse external variables (e.g., weather, demand patterns) to match ML performance. The researchers note that each node exhibited distinct data characteristics, making localized model training more effective than a single, system-wide model.
Once validated, the best-performing models for each node were operationalized into the Big Data platform. Forecasting processes were automated through Apache Airflow, enabling hourly, daily, and multi-day predictions. The outputs were visualized in real time through Power BI dashboards, where analysts could compare predicted and actual electricity prices for each node within minutes of data arrival.
The implementation achieved a forecasting turnaround time of about 15 minutes per node, updating automatically every 24 hours using a rolling four-week historical data window. This continuous retraining ensures adaptability to changing market conditions and maintains forecast accuracy.
Operational and strategic implications for energy markets
The research demonstrates significant economic and operational implications. In competitive electricity markets, precise forecasting drives more efficient bidding, risk management, and pricing strategies. The platform allows national grid operators to make data-driven decisions on energy distribution, congestion management, and generation scheduling.
The study also identifies broader applications of the platform for renewable energy integration, load management, and asset reliability monitoring. By combining AI and Big Data, the system can enhance the stability and sustainability of national power grids while reducing operational costs.
Importantly, the authors note that XGBoost and SVR models performed best due to their adaptability to short-term fluctuations, a key requirement for volatile electricity markets. They found that deep learning models were often unnecessary for these tasks, as simpler ML algorithms achieved high accuracy with lower computational demands.
Moving ahead, the researchers plan to extend the platform's capabilities to forecast demand, consumption, and renewable generation, while incorporating fault detection and pattern recognition models to improve grid resilience.
- FIRST PUBLISHED IN:
- Devdiscourse