Loading LSTM model...

Flight Details

Prediction Result

Fill in the flight details and click Run Prediction.
--
probability of delay (15+ min arrival)

Model Breakdown

Logistic Regression simulated
--%
MLP simulated
--%
Bidirectional LSTM simulated
--%
Target variable: arrival delay of 15 minutes or more. Click any chart to view it full size.

Class Distribution

Delayed vs. not delayed — shows class imbalance

Class distribution

Delay Rate by Hour

Evening flights accumulate more delays

Delay by hour

Delay Rate by Day of Week

Fridays and Sundays have the highest delay rates

Delay by day

Delay Rate by Distance

Short-haul flights are more exposed to congestion

Delay by distance

Correlation Heatmap

No single feature dominates — problem is non-linear

Correlation heatmap
Best ROC-AUC
0.6905
Bidirectional LSTM
Best F1 Score
0.4571
Bidirectional LSTM
Best Recall
0.6761
Bidirectional LSTM
ModelAccuracyPrecisionRecallF1 ScoreROC-AUC
Logistic Regression Baseline 0.60990.33050.64520.43710.6552
MLP 0.61890.34140.67110.45260.6847
Bidirectional LSTM Best 0.62300.34530.67610.45710.6905

Training Curves

MLP vs LSTM loss and AUC over epochs

Training curves

Model Comparison Chart

All metrics side by side

Model comparison

Confusion Matrices

Where each model makes mistakes

Confusion matrices

ROC Curves

All three models overlaid

ROC curves

Precision-Recall Curves

More informative than ROC for imbalanced data

PR curves
Why accuracy looks low: The dataset is ~80% not-delayed. A model that always predicts "not delayed" scores 80% accuracy without learning anything. We focus on Recall (catching actual delays) and F1 Score as primary metrics. The LSTM's ROC-AUC of 0.6905 means it correctly ranks a delayed flight above a non-delayed flight 69% of the time, versus 50% for random guessing.
Understanding model errors is as important as measuring accuracy. These charts show how confident each model is on delayed vs. non-delayed flights, and which departure hours are hardest to predict correctly.

Probability Score Distributions

How confident each model is — separated by true class

Probability distributions

LSTM Error Rate by Departure Hour

Early morning and late night departures are the hardest to predict

Error rate by hour
Probability distributions: A well-separated model produces two distinct humps — one near 0 for non-delayed flights and one near 1 for delayed flights. The overlap in the middle represents the hard cases where the model is uncertain. Error by hour: Errors cluster at early morning (0–5 AM) and late night (21–23) departures. These windows have sparse training data and unpredictable disruption patterns that schedule-based features alone cannot capture.

Team

  • EE
    Elias Estacion
  • RH
    Rochane Hurst
  • MR
    Meliton Rojas
  • BB
    Bricio Blancas Salgado
  • WS
    Wendy Santiago
  • MV
    Michael Vu

Dataset

  • Source U.S. Bureau of Transportation Statistics
  • Period May – October 2025
  • Size ~4.2 million flights
  • Target Arrival delay >= 15 minutes
  • Features 7 (schedule-based)
  • Split 80% train / 10% val / 10% test

Features Used

  • DAY_OF_WEEK Day the flight operates
  • DEP_HOUR Scheduled departure hour
  • ARR_HOUR Scheduled arrival hour
  • DISTANCE Flight distance in miles
  • CARRIER_ENC Airline (label encoded)
  • ORIGIN_ENC Origin airport (label encoded)
  • DEST_ENC Destination airport (label encoded)

LSTM Architecture

1
Input sequence (7 timesteps x 1 feature)
2
Bidirectional LSTM (64 units) + BatchNorm + Dropout
3
Bidirectional LSTM (32 units) + BatchNorm + Dropout
4
Dense (64) + Dense (32) classification head
5
Sigmoid output — delay probability
Predictions use the actual trained Bidirectional LSTM model loaded via TensorFlow.js. Logistic Regression and MLP values are approximated for comparison. Source code and notebooks available in this repository.