CAISO × ELECTRICITY LOAD FORECASTING

Predicting tomorrow’s demand, one hour at a time.

An end-to-end forecasting project that transforms historical grid demand, weather, calendar context, and weekly patterns into hourly electricity-load predictions for California and four subregions.

Explore the forecast See the model progression

Team research projectPresented at Millennium Management Global Investment

BEST FINAL520MW MAE

FORECAST WINDOW168 hours

REGIONS5

TOP MODELST-CALNet

RECRUITER SNAPSHOT

What this project demonstrates

Data engineering

Built a reproducible workflow for aligning hourly grid demand, regional observations, weather, calendar variables, and lag features.

Model benchmarking

Compared simple baselines with Random Forest, XGBoost, LSTM variants, and ST-CALNet using chronological evaluation.

Time-series reasoning

Used full-week windows, no-shuffle data splits, train-only scaling, and lag construction to prevent future-data leakage.

Product communication

Translated technical results into operational questions: peak demand, regional specialization, reliability, and deployment tradeoffs.

THE BUSINESS PROBLEM

Electricity must be available before customers ask for it.

Grid operators need an accurate view of future demand to plan generation, manage reserves, and prepare for peak hours. Forecast too low and reliability is at risk. Forecast too high and resources may be scheduled inefficiently.

01 / ACCURACY

Reduce the size of forecasting errors

Better forecasts help operators anticipate morning ramps, afternoon cooling demand, and evening peaks.

02 / INTERPRETABILITY

Explain why the model made its prediction

Simpler models expose coefficients and feature effects, making them easier to audit and communicate.

THE CORE QUESTION

When is additional complexity worth it?

The project benchmarks each architecture to verify that gains in accuracy are real—not simply the result of a more complicated model.

INTERACTIVE FORECAST LAB

See how an hourly load forecast is built.

Select a region, model, and forecasting strategy. The interface then walks through the same conceptual pipeline used in the project.

STAGE 01

Collect historical electricity demand

Hourly CAISO and subregion demand becomes the foundation of the forecast.

The pipeline preserves chronological order so the model only learns from information that would have existed at prediction time.

24-HOUR PORTFOLIO DEMONSTRATION

California Independent System Operator

The system-wide view combines several climates, customer groups, and regional demand patterns.

Forecast ready

34,377 MW

18,863 MW

74 MW

Demonstration metric calculated from the representative chart above.

ACTUAL PEAK32,680 MW

6 PM

PREDICTED PEAK32,555 MW

6 PM

REPORTED PROJECT RESULT520 MW final MAE

CNN + LSTM + attention

The chart uses representative portfolio data to demonstrate the product experience. Reported project metrics are labeled separately and come from the team’s completed model evaluation.

FROM RAW DATA TO FORECAST

The model does not begin with a neural network.

Reliable forecasting starts with a disciplined data pipeline. The project aligns each observation to the correct hour, region, and historical context before any model is trained.

Hourly grid demand

CAISO and regional load observations provide the target the models learn to predict.

Weather context

Temperature and environmental variables explain cooling and heating behavior.

Time features

Hour, weekday, season, and lagged demand expose repeated daily and weekly patterns.

Chronological split

Training, validation, and test sets remain ordered to prevent information from the future leaking backward.

MODEL INPUTS

What the forecast can see

Classical models receive engineered tabular features. Sequence models also receive an ordered 168-hour history so they can learn transitions across an entire week.

Historical loadTemperatureHour of dayDay of weekSeasonalityWeekly lagRolling contextRegionFuture-known time features

A KEY FINDING

Forecast locally, then add the regions together.

A single CAISO model must compromise across several climates and customer mixes. Separate regional models can specialize before their forecasts are summed into one system-level prediction.

SCE

PGE

SDGE

VEA

SUMCAISOsystem forecast

DIRECT VS. BOTTOM-UP CAISO FORECASTINGLower MAE is better

Linear Regression6.1% improvement

Direct1,446 MW

Bottom-up1,359 MW

Random Forest8.8% improvement

Direct1,131 MW

Bottom-up1,031 MW

XGBoost11.6% improvement

Direct1,146 MW

Bottom-up1,014 MW

Specialization: each regional model learns its own climate and demand shape.

Error cancellation: some regional over- and under-predictions offset when summed.

MODEL PROGRESSION

Complexity had to earn its place.

Each model answered a different question—from whether simple coefficients were sufficient to whether attention could identify the most useful moments in a full week of history.

Classical baseline

Linear Regression

Assigns one coefficient to each weather, calendar, hour, and lag feature. The prediction is a weighted sum of those inputs.

WHY IT MATTERED

Established a transparent baseline and made it possible to measure whether added complexity produced meaningful value.

REPORTED RESULT613.9 MW pooled MAE

Tree ensemble

Random Forest

Averages hundreds of decision trees, allowing the forecast to respond differently across temperatures, hours, and regions.

WHY IT MATTERED

Demonstrated that nonlinear feature interactions materially improve forecasting over a linear baseline.

REPORTED RESULT488.3 MW pooled MAE

Boosted trees

XGBoost

Builds trees sequentially so each new tree focuses on correcting the residual errors left by the previous ensemble.

WHY IT MATTERED

Created the strongest classical benchmark and provided a high bar for evaluating deep sequence models.

REPORTED RESULT1,014 MW bottom-up CAISO MAE

Deep sequence model

Bidirectional LSTM

Processes a 168-hour window in both directions, learning daily and weekly demand patterns from an ordered sequence.

WHY IT MATTERED

Proved that temporal sequence learning adds signal beyond a carefully tuned tree-based benchmark.

REPORTED RESULT827 MW MAE · 3.08% MAPE

CNN + LSTM + attention

ST-CALNet

Combines convolutional feature extraction, recurrent sequence learning, and attention over the most relevant historical moments.

WHY IT MATTERED

Delivered the project’s strongest final result by integrating local patterns, long-range temporal context, and selective attention.

REPORTED RESULT520 MW final MAE

CLASSICAL MODEL BENCHMARK

Pooled performance across five regions

These values belong to the pooled classical evaluation and should not be compared as though every later model used the identical aggregation context.

ModelMAERMSEContext

Linear Regression613.9 MW1,071.6 MWPooled across CAISO and four subregions

Random Forest488.3 MW868.9 MWPooled across CAISO and four subregions

XGBoost483.7 MW870.2 MWBest pooled classical MAE

BEST FINAL MODEL

Inside ST-CALNet

The final architecture combines three complementary ideas: a CNN for local patterns, an LSTM for sequence memory, and attention for deciding which historical moments deserve the most weight.

Historical inputs

Seven days of load, weather, lag, and time features.

Convolution

Detects local ramps, spikes, and short-term hourly patterns.

LSTM memory

Learns how demand evolves across daily and weekly sequences.

Attention

Weights the historical moments most useful for the forecast.

Load forecast

Produces the expected future demand in megawatts.

FINAL ST-CALNET RESULT520 MWmean absolute error

CNN

Extracts rapid hourly fluctuations and short local patterns.

LSTM

Tracks how demand evolves across the seven-day sequence.

ATTENTION

Focuses the forecast on the most relevant historical moments.

EVALUATION WITHOUT THE JARGON

Four metrics answer four different questions.

MAE

How far off are we, on average?

Mean Absolute Error reports the typical miss in megawatts. It is the most intuitive headline metric for this project.

RMSE

Are there any especially large misses?

Root Mean Squared Error penalizes large errors more heavily, making it useful when peak-hour mistakes are especially costly.

MAPE

What percentage of demand did we miss?

Mean Absolute Percentage Error normalizes the error, but it must be interpreted carefully for very small regional loads.

R²

How much demand variation is explained?

R² measures how well the forecast captures the overall rises, falls, and recurring patterns in electricity demand.

HONEST LIMITATIONS

A strong metric is not the end of the analysis.

The presentation treats limitations as part of the engineering result rather than hiding them behind the best number.

No future weather forecast

Historical weather helps explain past demand, but production forecasting should include the weather expected during the target horizon.

Next step: forecast-weather integration

Distribution shift

The Bidirectional LSTM’s validation-to-test gap suggests that late-period demand differed from the data used to fit the model.

Next step: rolling retraining

Missing event features

Holidays, wildfire periods, unusual heat events, and changing EV demand can alter load in ways ordinary calendar features miss.

Next step: event-aware features

Computational cost

ST-CALNet improves accuracy but requires more training and inference resources than the classical models.

Next step: latency and cost benchmarking

FROM NOTEBOOK TO SYSTEM

How the research could become a live forecasting service.

LIVE DATA

GridStatus.io

Collect the latest CAISO demand observations on a schedule.

FEATURE JOB

Hourly pipeline

Build the newest 168-hour window and future-known time inputs.

INFERENCE

Saved model weights

Load the selected model and generate the next forecast curve.

MONITORING

Hourly scoring

Compare predictions with actual load and watch for drift.

PROJECT TAKEAWAYS

The final result is more than one model score.

Nonlinearity matters. Tree ensembles reduced pooled error substantially compared with the linear baseline.

Geography matters. Regional specialization and bottom-up aggregation improved the CAISO forecast.

Sequence matters. Full-week context allowed deep models to learn recurring demand patterns that static features only approximate.

Data still matters most. Better architecture cannot fully compensate for missing future weather or major external events.

CAISO LOAD FORECASTING

Data engineering, machine learning, and product thinking in one system.

Return to forecast lab ↑