Portfolio Project · Time Series Forecasting

⚡ Energy Demand Forecasting

24–48 Hour Ahead Demand Prediction for the SOCO Grid Region

📍 Southeastern US (SOCO) 📅 2015 – 2021 ⏱ Hourly Resolution Loading…

What This Project Does

This project builds a production-quality time series forecasting pipeline to predict electricity demand 24–48 hours ahead for the Southern Company (SOCO) balancing authority — one of the largest electric utilities in the southeastern United States.

Accurate short-term load forecasting is critical for grid stability, dispatch scheduling, and cost optimization. A 1% improvement in forecast accuracy translates to millions of dollars in avoided over-generation costs and reduced reliance on expensive peaker plants.

The pipeline combines weather data from 7 SOCO-region cities, 43 engineered temporal, calendar, and weather features, and two Optuna-tuned models — Facebook Prophet and XGBoost — evaluated rigorously against a SARIMAX baseline on a held-out test set.

Project KPIs

📈 Data Points Analyzed

92,833

Hourly · 2015–2021

🔧 Features Engineered

48

Lags · Rolling · HDD/CDH · Cyclical

🎯 Best RMSE (MWh)

—

📉 MAPE

—

Model Performance at a Glance

💡 Real-World Impact
A —% RMSE reduction over the SARIMAX baseline means grid operators receive significantly tighter demand estimates 24–48 hours ahead. For a utility serving millions of customers, this precision enables more efficient unit commitment, reduces spinning reserve requirements, and lowers operational costs — directly benefiting ratepayers.

Tech Stack

🐍

Python 3.11

Core language

🐼

Pandas

Data wrangling

🌲

XGBoost

Gradient-boosted trees

📈

Prophet

Time series model

📊

Statsmodels

SARIMAX baseline

🔬

Optuna

Hyperparameter tuning

🔄

MLflow

Experiment tracking

🎛️

Streamlit

Portfolio app

📉

Plotly

Interactive charts

🧮

scikit-learn

Metrics & pipeline

☁️

NOAA API

Weather data

⚡

EIA-930

Energy operations data

🔗

PUDL

Data imputation

Dataset Summary

📋 Expand Dataset Details

Energy Data (EIA-930) · Source: US Energy Information Administration
· SOCO balancing authority region
· Hourly demand, generation, interchange
· Data imputation via PUDL toolkit

Weather Data (NOAA) · 7 representative SOCO cities
· Temperature, humidity, dew point
· Solar radiation, wind speed
· Computed HDD/CDH per city + region mean

Feature Engineering · 24h / 48h / 168h demand lags
· 24h / 168h rolling stats (mean, std)
· Cyclical encodings (hour, day, month)
· Temperature × hour interaction terms

Exploratory Data Analysis

24 pre-computed figures covering demand patterns, data quality, feature engineering, demand drivers, seasonality, and weather relationships.

Model Results

⚖️ Model Comparison Takeaways
A well-specified SARIMAX(1,1,1)(1,1,1,24) with rolling 24-h evaluation outperforms tuned Prophet (RMSE 1,829 vs 1,956 MWh) — demonstrating that classical models remain competitive when evaluated in their natural day-ahead operating regime. XGBoost wins by exploiting 43 lag, weather, and calendar features simultaneously (RMSE 1,147 MWh, −37% vs SARIMAX).

Full Metrics Table

Model	RMSE (MWh)	MAE (MWh)	MAPE	Description

Window size: 60 days

Showing the first 60 days of the held-out test period (18,567 hours total). Use the slider to zoom in or out.

Prophet MAE

—

Prophet RMSE

—

XGBoost MAE

—

XGBoost RMSE

—

SARIMAX MAE

—

SARIMAX RMSE

—

Window Size

—

● Prophet (Tuned) ● XGBoost (Tuned)

Residual sign convention: Residual = Actual − Predicted. Negative residuals = model overpredicted demand. Positive residuals = model underpredicted demand.

Residual Summary Metrics

Mean Residual (Bias)

—

Std Deviation

—

MAE

—

Worst Overprediction

—

Worst Underprediction

—

Full test-period view with range selector and drag-to-zoom. Use the buttons or drag the range slider below the chart to navigate.

Methodology

End-to-End Pipeline

📥

Data Ingestion

EIA-930 + NOAA API

🔧

Preprocessing

PUDL imputation · timezone fix

⚗️

Feature Engineering

43–48 features

📊

SARIMAX Baseline

Classical reference

🔍

Optuna Tuning

Prophet + XGBoost

🏆

XGBoost (Tuned)

RMSE 1,147 MWh · MAPE 2.90%

Project Phases

Phase 1

Data Collection & Storage

Pulled 7-year hourly EIA-930 demand series and NOAA weather for 7 SOCO-region cities. Applied PUDL imputation to fill gaps. Unified on UTC timestamps.

Phase 2

Exploratory Analysis

Identified strong 24h, 168h, and annual seasonality. Confirmed U-curve demand–temperature relationship. Flagged humidity and holiday effects.

Phase 3

SARIMAX Baseline

Fit SARIMAX(1,1,1)(1,1,1,24) with seasonal differencing, enforce_stationarity=True, and HDH/CDH/holiday exogenous variables. Evaluated with a rolling 24-h horizon. RMSE: 1,829 MWh · MAPE: 4.82%. Notably outperforms tuned Prophet.

Phase 4

Feature Engineering

Built 48 features: lag-24h/48h/168h, rolling 24h/168h stats, HDD/CDH per city, cyclical sin/cos encodings, and temperature × hour interaction terms.

Phase 5

Optuna Hyperparameter Tuning

Ran 50+ trials each for Prophet (CV-based RMSE) and XGBoost (time-series CV). Best XGBoost params: max_depth=6, n_estimators=790, learning_rate=0.058.

Phase 6

Evaluation & MLflow Tracking

Evaluated all models on the held-out 20% test set (18,567 hours). XGBoost wins with RMSE 1,147 MWh and MAPE 2.90% — 37% improvement over SARIMAX (rolling 24-h). SARIMAX outperforms tuned Prophet on all three metrics.

Key Design Decisions

80/20 Chronological Split

No shuffling — the split respects temporal order to prevent data leakage. Train: 74,266 hours. Test: 18,567 hours (last ~2.1 years).

Lag Features as Primary Predictors

demand_lag_24h and demand_lag_168h are the two highest-correlated features. They capture daily and weekly periodicity more directly than cyclical encodings.

Weather Aggregation Strategy

Per-city HDD/CDH retained alongside region-mean values. City-level granularity captures spatial variation in grid load across the SOCO service territory.

Optuna Over Grid Search

Tree-structured Parzen Estimator (TPE) sampler efficiently explored the continuous hyperparameter space — far cheaper than exhaustive grid search given XGBoost's 8-dimensional parameter space.

Lessons Learned

Lesson 01

CV Results Don't Always Predict Test-Set Winners

Prophet had a lower cross-validation RMSE during tuning than XGBoost, but XGBoost significantly outperformed it on the held-out test set. This highlights the importance of a true out-of-sample evaluation — CV scores are model selection tools, not final verdicts.

Lesson 02

Timezone Handling is a First-Class Engineering Concern

EIA-930 and NOAA data arrive in different timezones. A silent timezone mismatch creates a systematic 5-hour shift in demand-weather alignment — one of the most damaging data bugs possible in this domain. UTC everywhere, validate early.

Lesson 03

Lag Features Are Both the Most Powerful and Most Dangerous

demand_lag_24h is the strongest predictor — but it also creates the highest risk of data leakage if the split is not strictly chronological. The 80/20 temporal split with no shuffling was non-negotiable.

Lesson 04

Prophet's Built-In Diagnostics Are Production-Ready

Prophet's cross-validation and component plots (trend, weekly, yearly, holidays) are genuinely useful for stakeholder communication — not just model development. The decomposition makes seasonality immediately interpretable.

Lesson 05

Portfolio Apps Are Themselves Engineering Products

Building both a Streamlit app and this static HTML version forced clear separation between data generation (Python scripts), model artefacts (JSON/CSV), and presentation (app layer). This separation made both apps easier to maintain and deploy independently.