24–48 Hour Ahead Demand Prediction for the SOCO Grid Region
📍 Southeastern US (SOCO)📅 2015 – 2021⏱ Hourly ResolutionLoading…
What This Project Does
This project builds a production-quality time series forecasting pipeline to predict
electricity demand 24–48 hours ahead for the Southern Company (SOCO) balancing authority —
one of the largest electric utilities in the southeastern United States.
Accurate short-term load forecasting is critical for grid stability, dispatch scheduling,
and cost optimization. A 1% improvement in forecast accuracy translates to millions of
dollars in avoided over-generation costs and reduced reliance on expensive peaker plants.
The pipeline combines weather data from 7 SOCO-region cities, 43 engineered
temporal, calendar, and weather features, and two Optuna-tuned models — Facebook Prophet
and XGBoost — evaluated rigorously against a SARIMAX baseline on a held-out test set.
Project KPIs
📈 Data Points Analyzed
92,833
Hourly · 2015–2021
🔧 Features Engineered
48
Lags · Rolling · HDD/CDH · Cyclical
🎯 Best RMSE (MWh)
—
—
📉 MAPE
—
Model Performance at a Glance
💡 Real-World Impact
A —% RMSE reduction over the SARIMAX baseline means grid operators
receive significantly tighter demand estimates 24–48 hours ahead. For a utility serving millions
of customers, this precision enables more efficient unit commitment, reduces spinning reserve
requirements, and lowers operational costs — directly benefiting ratepayers.
Tech Stack
🐍
Python 3.11
Core language
🐼
Pandas
Data wrangling
🌲
XGBoost
Gradient-boosted trees
📈
Prophet
Time series model
📊
Statsmodels
SARIMAX baseline
🔬
Optuna
Hyperparameter tuning
🔄
MLflow
Experiment tracking
🎛️
Streamlit
Portfolio app
📉
Plotly
Interactive charts
🧮
scikit-learn
Metrics & pipeline
☁️
NOAA API
Weather data
⚡
EIA-930
Energy operations data
🔗
PUDL
Data imputation
Dataset Summary
📋 Expand Dataset Details
Energy Data (EIA-930)
· Source: US Energy Information Administration
· SOCO balancing authority region
· Hourly demand, generation, interchange
· Data imputation via PUDL toolkit
Weather Data (NOAA)
· 7 representative SOCO cities
· Temperature, humidity, dew point
· Solar radiation, wind speed
· Computed HDD/CDH per city + region mean
24 pre-computed figures covering demand patterns, data quality, feature engineering,
demand drivers, seasonality, and weather relationships.
Model Results
⚖️ Model Comparison Takeaways
A well-specified SARIMAX(1,1,1)(1,1,1,24) with rolling 24-h evaluation outperforms
tuned Prophet (RMSE 1,829 vs 1,956 MWh) — demonstrating that classical models remain
competitive when evaluated in their natural day-ahead operating regime. XGBoost wins by
exploiting 43 lag, weather, and calendar features simultaneously (RMSE 1,147 MWh, −37% vs SARIMAX).
Full Metrics Table
Model
RMSE (MWh)
MAE (MWh)
MAPE
Description
Showing the first 60 days of the held-out
test period (18,567 hours total). Use the slider to zoom in or out.
Prophet MAE
—
Prophet RMSE
—
XGBoost MAE
—
XGBoost RMSE
—
SARIMAX MAE
—
SARIMAX RMSE
—
Window Size
—
Residual sign convention: Residual = Actual − Predicted.
Negative residuals = model overpredicted demand.
Positive residuals = model underpredicted demand.
Residual Summary Metrics
Mean Residual (Bias)
—
Std Deviation
—
MAE
—
Worst Overprediction
—
Worst Underprediction
—
Full test-period view with range selector and drag-to-zoom. Use the buttons or
drag the range slider below the chart to navigate.
Methodology
End-to-End Pipeline
📥
Data Ingestion
EIA-930 + NOAA API
🔧
Preprocessing
PUDL imputation · timezone fix
⚗️
Feature Engineering
43–48 features
📊
SARIMAX Baseline
Classical reference
🔍
Optuna Tuning
Prophet + XGBoost
🏆
XGBoost (Tuned)
RMSE 1,147 MWh · MAPE 2.90%
Project Phases
Phase 1
Data Collection & Storage
Pulled 7-year hourly EIA-930 demand series and NOAA weather for 7 SOCO-region cities. Applied PUDL imputation to fill gaps. Unified on UTC timestamps.
Phase 2
Exploratory Analysis
Identified strong 24h, 168h, and annual seasonality. Confirmed U-curve demand–temperature relationship. Flagged humidity and holiday effects.
Phase 3
SARIMAX Baseline
Fit SARIMAX(1,1,1)(1,1,1,24) with seasonal differencing, enforce_stationarity=True, and HDH/CDH/holiday exogenous variables. Evaluated with a rolling 24-h horizon. RMSE: 1,829 MWh · MAPE: 4.82%. Notably outperforms tuned Prophet.
Phase 4
Feature Engineering
Built 48 features: lag-24h/48h/168h, rolling 24h/168h stats, HDD/CDH per city, cyclical sin/cos encodings, and temperature × hour interaction terms.
Phase 5
Optuna Hyperparameter Tuning
Ran 50+ trials each for Prophet (CV-based RMSE) and XGBoost (time-series CV). Best XGBoost params: max_depth=6, n_estimators=790, learning_rate=0.058.
Phase 6
Evaluation & MLflow Tracking
Evaluated all models on the held-out 20% test set (18,567 hours). XGBoost wins with RMSE 1,147 MWh and MAPE 2.90% — 37% improvement over SARIMAX (rolling 24-h). SARIMAX outperforms tuned Prophet on all three metrics.
Key Design Decisions
80/20 Chronological Split
No shuffling — the split respects temporal order to prevent data leakage.
Train: 74,266 hours. Test: 18,567 hours (last ~2.1 years).
Lag Features as Primary Predictors
demand_lag_24h and demand_lag_168h are the two highest-correlated features.
They capture daily and weekly periodicity more directly than cyclical encodings.
Weather Aggregation Strategy
Per-city HDD/CDH retained alongside region-mean values. City-level granularity
captures spatial variation in grid load across the SOCO service territory.
Optuna Over Grid Search
Tree-structured Parzen Estimator (TPE) sampler efficiently explored the
continuous hyperparameter space — far cheaper than exhaustive grid search
given XGBoost's 8-dimensional parameter space.
Lessons Learned
Lesson 01
CV Results Don't Always Predict Test-Set Winners
Prophet had a lower cross-validation RMSE during tuning than XGBoost, but XGBoost
significantly outperformed it on the held-out test set. This highlights the importance
of a true out-of-sample evaluation — CV scores are model selection tools, not final verdicts.
Lesson 02
Timezone Handling is a First-Class Engineering Concern
EIA-930 and NOAA data arrive in different timezones. A silent timezone mismatch
creates a systematic 5-hour shift in demand-weather alignment — one of the most
damaging data bugs possible in this domain. UTC everywhere, validate early.
Lesson 03
Lag Features Are Both the Most Powerful and Most Dangerous
demand_lag_24h is the strongest predictor — but it also creates the highest risk
of data leakage if the split is not strictly chronological. The 80/20 temporal
split with no shuffling was non-negotiable.
Lesson 04
Prophet's Built-In Diagnostics Are Production-Ready
Prophet's cross-validation and component plots (trend, weekly, yearly, holidays)
are genuinely useful for stakeholder communication — not just model development.
The decomposition makes seasonality immediately interpretable.
Lesson 05
Portfolio Apps Are Themselves Engineering Products
Building both a Streamlit app and this static HTML version forced clear separation
between data generation (Python scripts), model artefacts (JSON/CSV), and presentation
(app layer). This separation made both apps easier to maintain and deploy independently.