βοΈ Blog Title:
Detecting Network Anomalies with XGBoost and SMOTE: From Cybersecurity Logs to AI Models
π§ Introduction
As someone transitioning from a cybersecurity background into AI, I recently challenged myself to turn raw network traffic into intelligent insights. The result? A complete machine learning pipeline that detects DoS (Denial-of-Service) attacks with 99.9%+ accuracy and AUC, built on top of real-world IoT traffic.
This project marks a key milestone in my journey β transforming my hands-on experience with logs and network security into a practical AI application.
π What Problem Are We Solving?
Traditional intrusion detection systems (IDS) often fail to detect sophisticated or low-rate DoS attacks. Moreover, the volume of network logs and the class imbalance between normal and malicious traffic make this task even harder.
So I asked myself:
Can we use modern machine learning to detect anomalies directly from network logs?
πΎ Dataset: IoTID20-Extended (2024)
We used the IoTID20-Extended dataset, a recent and comprehensive collection of real IoT network traffic. It includes labeled flows representing normal and various attack types β including DoS and DDoS.
π Dataset link: Kaggle β IoTID20 Dataset
π οΈ Approach Overview
We designed an end-to-end pipeline with the following stages:
-
Data Preprocessing
- Handle missing values, encode categorical features, scale numerical ones.
-
Feature Selection
- Used
SelectKBest to extract top predictive features.
-
Class Balancing
- Applied
SMOTE to synthetically oversample underrepresented attack traffic.
-
Model Training
- Used
XGBoost, known for performance on tabular datasets.
-
Evaluation
- 10-Fold Cross-Validation using
F1-score and ROC-AUC.
π Results
The model achieved:
- β
Accuracy: 100%
- β
F1 Score: 1.00
- β
ROC-AUC: 1.00
These results are exceptional, but they reflect a balanced, clean dataset. In real-world deployments, weβd expect slightly lower but still strong performance.
π Confusion Matrix and ROC Curve plots were also generated (see GitHub).
π‘ Why This Matters
This project proves that AI can effectively augment traditional network security β not just by detecting anomalies, but by learning from raw or semi-structured data like logs. Itβs a step toward AI-driven intrusion detection systems.
As a cybersecurity expert now stepping into AI, this fusion of domains is exactly where I plan to build next.
π Try It Yourself
Full project code, notebook, and results are available on GitHub:
π GitHub Repo β Log Anomaly Detection
Includes:
- Notebook with all steps
- Visual results
- Cleaned dataset path
README.md + requirements.txt
π Next Steps
This is just the beginning. My roadmap includes:
- Applying LLMs to raw
.log files
- Integrating SHAP/LIME for model explainability
- Deploying real-time log anomaly detectors
- Combining clustering + classification in hybrid models
π¨βπ» About Me
Iβm Hazem Elbaz, a cybersecurity researcher shifting toward applied AI and intelligent automation in network defense.
π§ Follow my journey of building real-world AI from the ground up at:
π elbazhazem.github.io
βQuestion for You
Have you tried using ML or AI in log analysis or cybersecurity? What tools or datasets worked for you?
π Letβs discuss in the comments.