Target variable: arrival delay of 15 minutes or more. Click any chart to view it full size.
Best ROC-AUC
0.6905
Bidirectional LSTM
Best F1 Score
0.4571
Bidirectional LSTM
Best Recall
0.6761
Bidirectional LSTM
| Model | Accuracy | Precision | Recall | F1 Score | ROC-AUC |
| Logistic Regression Baseline |
0.6099 | 0.3305 | 0.6452 | 0.4371 | 0.6552 |
| MLP |
0.6189 | 0.3414 | 0.6711 | 0.4526 | 0.6847 |
| Bidirectional LSTM Best |
0.6230 | 0.3453 | 0.6761 | 0.4571 | 0.6905 |
Why accuracy looks low: The dataset is ~80% not-delayed. A model that always predicts "not delayed" scores 80% accuracy without learning anything. We focus on Recall (catching actual delays) and F1 Score as primary metrics. The LSTM's ROC-AUC of 0.6905 means it correctly ranks a delayed flight above a non-delayed flight 69% of the time, versus 50% for random guessing.
Understanding model errors is as important as measuring accuracy. These charts show how confident each model is on delayed vs. non-delayed flights, and which departure hours are hardest to predict correctly.
Probability distributions: A well-separated model produces two distinct humps — one near 0 for non-delayed flights and one near 1 for delayed flights. The overlap in the middle represents the hard cases where the model is uncertain. Error by hour: Errors cluster at early morning (0–5 AM) and late night (21–23) departures. These windows have sparse training data and unpredictable disruption patterns that schedule-based features alone cannot capture.
Team
EE
Elias Estacion
RH
Rochane Hurst
MR
Meliton Rojas
BB
Bricio Blancas Salgado
WS
Wendy Santiago
MV
Michael Vu
Dataset
- Source U.S. Bureau of Transportation Statistics
- Period May – October 2025
- Size ~4.2 million flights
- Target Arrival delay >= 15 minutes
- Features 7 (schedule-based)
- Split 80% train / 10% val / 10% test
Features Used
- DAY_OF_WEEK Day the flight operates
- DEP_HOUR Scheduled departure hour
- ARR_HOUR Scheduled arrival hour
- DISTANCE Flight distance in miles
- CARRIER_ENC Airline (label encoded)
- ORIGIN_ENC Origin airport (label encoded)
- DEST_ENC Destination airport (label encoded)
LSTM Architecture
1
Input sequence (7 timesteps x 1 feature)
↓
2
Bidirectional LSTM (64 units) + BatchNorm + Dropout
↓
3
Bidirectional LSTM (32 units) + BatchNorm + Dropout
↓
4
Dense (64) + Dense (32) classification head
↓
5
Sigmoid output — delay probability
Predictions use the actual trained Bidirectional LSTM model loaded via TensorFlow.js. Logistic Regression and MLP values are approximated for comparison. Source code and notebooks available in this repository.