Data engineering
Built a reproducible workflow for aligning hourly grid demand, regional observations, weather, calendar variables, and lag features.
CAISO × ELECTRICITY LOAD FORECASTING
An end-to-end forecasting project that transforms historical grid demand, weather, calendar context, and weekly patterns into hourly electricity-load predictions for California and four subregions.
RECRUITER SNAPSHOT
Built a reproducible workflow for aligning hourly grid demand, regional observations, weather, calendar variables, and lag features.
Compared simple baselines with Random Forest, XGBoost, LSTM variants, and ST-CALNet using chronological evaluation.
Used full-week windows, no-shuffle data splits, train-only scaling, and lag construction to prevent future-data leakage.
Translated technical results into operational questions: peak demand, regional specialization, reliability, and deployment tradeoffs.
THE BUSINESS PROBLEM
Grid operators need an accurate view of future demand to plan generation, manage reserves, and prepare for peak hours. Forecast too low and reliability is at risk. Forecast too high and resources may be scheduled inefficiently.
Better forecasts help operators anticipate morning ramps, afternoon cooling demand, and evening peaks.
Simpler models expose coefficients and feature effects, making them easier to audit and communicate.
The project benchmarks each architecture to verify that gains in accuracy are real—not simply the result of a more complicated model.
INTERACTIVE FORECAST LAB
Select a region, model, and forecasting strategy. The interface then walks through the same conceptual pipeline used in the project.
Hourly CAISO and subregion demand becomes the foundation of the forecast.
The pipeline preserves chronological order so the model only learns from information that would have existed at prediction time.The system-wide view combines several climates, customer groups, and regional demand patterns.
Demonstration metric calculated from the representative chart above.
6 PM
6 PM
CNN + LSTM + attention
The chart uses representative portfolio data to demonstrate the product experience. Reported project metrics are labeled separately and come from the team’s completed model evaluation.
FROM RAW DATA TO FORECAST
Reliable forecasting starts with a disciplined data pipeline. The project aligns each observation to the correct hour, region, and historical context before any model is trained.
CAISO and regional load observations provide the target the models learn to predict.
Temperature and environmental variables explain cooling and heating behavior.
Hour, weekday, season, and lagged demand expose repeated daily and weekly patterns.
Training, validation, and test sets remain ordered to prevent information from the future leaking backward.
Classical models receive engineered tabular features. Sequence models also receive an ordered 168-hour history so they can learn transitions across an entire week.
A KEY FINDING
A single CAISO model must compromise across several climates and customer mixes. Separate regional models can specialize before their forecasts are summed into one system-level prediction.
Specialization: each regional model learns its own climate and demand shape.
Error cancellation: some regional over- and under-predictions offset when summed.
MODEL PROGRESSION
Each model answered a different question—from whether simple coefficients were sufficient to whether attention could identify the most useful moments in a full week of history.
Assigns one coefficient to each weather, calendar, hour, and lag feature. The prediction is a weighted sum of those inputs.
Established a transparent baseline and made it possible to measure whether added complexity produced meaningful value.
Averages hundreds of decision trees, allowing the forecast to respond differently across temperatures, hours, and regions.
Demonstrated that nonlinear feature interactions materially improve forecasting over a linear baseline.
Builds trees sequentially so each new tree focuses on correcting the residual errors left by the previous ensemble.
Created the strongest classical benchmark and provided a high bar for evaluating deep sequence models.
Processes a 168-hour window in both directions, learning daily and weekly demand patterns from an ordered sequence.
Proved that temporal sequence learning adds signal beyond a carefully tuned tree-based benchmark.
Combines convolutional feature extraction, recurrent sequence learning, and attention over the most relevant historical moments.
Delivered the project’s strongest final result by integrating local patterns, long-range temporal context, and selective attention.
These values belong to the pooled classical evaluation and should not be compared as though every later model used the identical aggregation context.
BEST FINAL MODEL
The final architecture combines three complementary ideas: a CNN for local patterns, an LSTM for sequence memory, and attention for deciding which historical moments deserve the most weight.
Seven days of load, weather, lag, and time features.
Detects local ramps, spikes, and short-term hourly patterns.
Learns how demand evolves across daily and weekly sequences.
Weights the historical moments most useful for the forecast.
Produces the expected future demand in megawatts.
Extracts rapid hourly fluctuations and short local patterns.
Tracks how demand evolves across the seven-day sequence.
Focuses the forecast on the most relevant historical moments.
EVALUATION WITHOUT THE JARGON
Mean Absolute Error reports the typical miss in megawatts. It is the most intuitive headline metric for this project.
Root Mean Squared Error penalizes large errors more heavily, making it useful when peak-hour mistakes are especially costly.
Mean Absolute Percentage Error normalizes the error, but it must be interpreted carefully for very small regional loads.
R² measures how well the forecast captures the overall rises, falls, and recurring patterns in electricity demand.
HONEST LIMITATIONS
The presentation treats limitations as part of the engineering result rather than hiding them behind the best number.
Historical weather helps explain past demand, but production forecasting should include the weather expected during the target horizon.
Next step: forecast-weather integrationThe Bidirectional LSTM’s validation-to-test gap suggests that late-period demand differed from the data used to fit the model.
Next step: rolling retrainingHolidays, wildfire periods, unusual heat events, and changing EV demand can alter load in ways ordinary calendar features miss.
Next step: event-aware featuresST-CALNet improves accuracy but requires more training and inference resources than the classical models.
Next step: latency and cost benchmarkingFROM NOTEBOOK TO SYSTEM
Collect the latest CAISO demand observations on a schedule.
Build the newest 168-hour window and future-known time inputs.
Load the selected model and generate the next forecast curve.
Compare predictions with actual load and watch for drift.
PROJECT TAKEAWAYS
Nonlinearity matters. Tree ensembles reduced pooled error substantially compared with the linear baseline.
Geography matters. Regional specialization and bottom-up aggregation improved the CAISO forecast.
Sequence matters. Full-week context allowed deep models to learn recurring demand patterns that static features only approximate.
Data still matters most. Better architecture cannot fully compensate for missing future weather or major external events.
CAISO LOAD FORECASTING