Blogs | Hazem Elbaz
Academic and Professional Achievements – 2025
As 2025 comes to a close, it is important to document a year marked by exceptional academic and professional commitment, unfolding within highly non-traditional and constrained circumstances. The year was shaped by severe humanitarian, institutional, and technical challenges resulting from the ongoing war and its direct impact on the academic and research environment in Gaza.
Despite these conditions, 2025 became a year of academic continuity, strategic adaptation, and a deliberate shift toward more applied and impactful research directions.
1. Academic and Teaching Activities
Throughout 2025, I continued my work as a faculty member and researcher in the areas of:
- Cybersecurity
- Network and Cloud Security
- Log Analysis and Anomaly Detection
- AI-Driven Security Operations Center (SOC) Automation
This included university teaching, academic supervision of graduation projects and research, and the development of instructional content that bridges theoretical foundations with practical, industry-relevant applications.
2. Research and Development
On the research front, my work during 2025 focused on:
- Developing anomaly detection models using machine learning and deep learning techniques
- Integrating Large Language Models (LLMs) into security log analysis workflows
- Designing and prototyping AI-driven SOC automation frameworks
- Expanding and refining research papers targeting submission to peer-reviewed journals and conferences
A significant portion of this work was conducted under severe operational constraints, including power outages, limited connectivity, and restricted access to computational resources, necessitating highly adaptive research planning and execution strategies.
3. Transition Toward Applied and Open Research
A key milestone in 2025 was a methodological shift toward applied, open, and reproducible research, with emphasis on:
- Documenting research workflows and outputs through open-source GitHub repositories
- Building reusable and extensible research artifacts
- Aligning academic research with real-world SOC operational needs
- Promoting applied research culture among students and early-career researchers
This transition was driven by a strategic objective to maximize scientific, educational, and societal impact in resource-constrained environments.
4. Academic Leadership and Capacity Building
In parallel with research and teaching responsibilities, I continued to contribute to:
- Mentorship and academic guidance for students and graduates
- Participation in training and capacity-building initiatives
- Engagement in discussions related to research development and academic resilience
- Strategic planning for future initiatives aimed at creating more sustainable learning and research environments
Skills Strengthened During 2025
Academic and Professional Skills
- Applied research design and execution
- Scientific writing and peer-review processes
- Research supervision and team building
- Research project management in high-risk environments
Technical Skills
- AI-Driven SOC Automation
- Log Analysis and Anomaly Detection
- LLM Integration for Cybersecurity
- Dataset Engineering and Evaluation Pipelines
Year Summary
While 2025 was far from a conventional academic year, it proved to be a pivotal period for consolidating academic resilience, advancing applied research with tangible impact, and redefining the notion of achievement under fragile and constrained conditions.
Looking Ahead
As I move into 2026, my focus remains on:
- Establishing international research collaborations
- Securing stable and safe academic and research opportunities
- Producing high-quality scientific publications
- Developing research frameworks that address local challenges while adhering to international academic standards
Before You Build Detection… Make Sure You Have Collection
“There’s no detection without collection.”
This simple truth is one of the most overlooked principles in modern SOC operations.
🧠 Introduction
In every SOC I’ve seen, teams are eager to start writing use cases, mapping them to MITRE ATT&CK, creating SIEM rules, and claiming, “We’re ready to detect any attack.”
But too often, they skip the step that makes all of this possible: data collection.
Before your detection rules can work, your SOC must have a solid foundation of telemetry — without it, even the best detection logic will fail silently.
⚙️ Collection Comes Before Detection
A Security Operations Center is an engineered system. The detection layer can’t function if the foundation layer (Telemetry & Collection) is unstable.
You can have the most advanced correlation engine in the world, but if your critical systems aren’t generating or forwarding enough logs, your SOC will see nothing.
Every detection depends on observable facts from your sensors, agents, and integrations.
Missing even one essential source — such as Windows Security Event Logs, EDR process telemetry, or DNS traffic — can create massive blind spots.
This becomes critical during lateral movement or privilege escalation attempts, where visibility gaps can completely hide attacker activity.
🔍 Step One: Log Source Review
Before writing any detection rules, conduct a comprehensive log source review — not a superficial checklist, but a technical validation that answers:
- Is the source actually enabled and sending logs to the SIEM?
- Are the events complete, or are they truncated?
- Do the logs cover all necessary audit categories (authentication, file access, process creation, etc.)?
- Are the fields properly parsed and normalized?
This gives you a true view of data coverage, not the assumed one.
Only then can you safely connect your detection use cases to the log sources that actually support them.
🗺️ When There’s No Asset Inventory or Network Diagram
This is one of the most common challenges for new SOC teams.
You enter an environment with thousands of devices and servers — but no updated CMDB, and no clear network map.
In that case, use a Bottom-Up Visibility Mapping approach: build visibility from the telemetry you already have.
Start from your existing logs in the SIEM or EDR and gradually reconstruct the environment:
- Identify active devices from endpoint data (
DeviceName, Hostname, or AgentID).
- Map communication patterns from firewall or proxy logs.
- Extract user-to-device associations from Active Directory sign-ins.
- Analyze outbound connections to spot systems exposed to the internet.
By doing this, you build a real-world inventory based on evidence — not assumptions — which becomes the backbone of your detection strategy.
📡 Are You Collecting the Right Data?
The more diverse your telemetry, the stronger your detection capabilities.
Examples:
- Endpoint logs → reveal process executions and local activity.
- Network telemetry → exposes lateral movements.
- Identity logs → highlight suspicious access behavior.
- Cloud audit logs → track privileged operations in SaaS or IaaS.
Regularly review your schema coverage:
Make sure critical fields like UserPrincipalName, DeviceId, IPAddress, and Timestamp exist and are normalized.
Such consistency allows your correlation logic to connect dots accurately — the difference between catching an incident or missing an attack entirely.
🧩 Key Takeaways
- Visibility is the foundation of detection.
- Build your telemetry coverage before your rules.
- Review log sources as rigorously as you test detections.
- Correlate across data types to see the full attack surface.
No Collection, No Detection.
Every SOC’s power begins with what it can see.
🔗 Read Next
If you’re building your own SOC or starting your journey into SIEM and detection engineering, check out:
👉 Microsoft Sentinel Home Lab Setup | Step-by-Step Guide
A complete hands-on tutorial to deploy Microsoft Sentinel, connect data sources, and simulate real detections.
#CyberSecurity #SIEM #SOC #ThreatDetection #SecurityOperations
#MicrosoftSentinel #IncidentResponse #DetectionEngineering #CloudSecurity #SOCAnalysis
✨ New Milestone Achieved!
I’m pleased to share that I’ve successfully completed the “Business Model Canvas: A Tool for Entrepreneurs and Innovators” course from Kennesaw State University via Coursera.
Certificate: https://coursera.org/share/4bb3fb3a6f2e000cbeff4a6bbc0ea618
Course reference: https://www.coursera.org/learn/business-model-canvas/
Key Skills I Gained from this Training
- Business Model Design & Structuring
- Value Proposition Development
- Customer Segmentation & Market Fit Thinking
- Go-to-Market Strategy Logic
- Revenue Streams & Cost Structure Mapping
- Lean Startup mindset & hypothesis testing
- Translating technical solutions into investor-friendly business language
These skills are extremely relevant to my current direction in AI-Driven SOC Automation — helping me bridge between cybersecurity research and business execution.
This course was a strong addition to my learning pipeline, especially as I continue shaping my upcoming technology venture and refining the business logic behind productizing AI-SOC solutions.
Question to my network:
What course or recent learning experience gave you a major “perspective shift” in connecting technical work to business value?
#ContinuousLearning #BusinessModelCanvas #CyberSecurity #AISOC #Entrepreneurship #ProfessionalGrowth
🧠 Building a SOC Home Lab from Zero — Catching Real Attackers on Azure
“Every attack is a lesson — the key is building systems that learn faster than attackers do.”
— Dr. Hazem A. Elbaz
🚀 Introduction
In this post, I’ll walk you through one of my most exciting hands-on projects — building a Security Operations Center (SOC) from scratch using Microsoft Azure’s free tier and Microsoft Sentinel.
This project is not just theoretical; it captures real-world cyberattacks and transforms them into actionable intelligence through dashboards and live maps.
Whether you’re a cybersecurity student, SOC analyst, or researcher, this lab is an ideal starting point to explore how professional SOC environments detect, collect, and analyze threats in real time.
🏗️ Why I Built This Project
After years of teaching and researching cybersecurity, I wanted to design a lab that:
- Bridges theory and reality — by exposing a honeypot to actual attackers.
- Empowers learners — to build and observe a functioning SOC environment.
- Showcases portfolio-ready skills — for anyone pursuing a cybersecurity career.
By using Azure’s free resources, anyone can replicate this setup safely and affordably.
🔍 Project Overview
Here’s what the home SOC includes:
| Component |
Description |
| Azure Subscription (Free Tier) |
Deploys all resources at zero cost. |
| Honeypot VM |
A Windows 10 machine deliberately exposed to attackers. |
| Log Analytics Workspace (LAW) |
Centralized log storage and analysis engine. |
| Microsoft Sentinel |
SIEM platform for correlation, alerting, and visualization. |
| Live Attack Map |
Displays attack origins in real time. |
⚙️ Step-by-Step Highlights
1️⃣ Setting up Azure
Create a free Azure subscription and configure:
- Resource Group
- Virtual Network (VNet)
- Virtual Machine (Windows 10 Honeypot)
2️⃣ Deploying the Honeypot
Expose the VM intentionally:
- Delete RDP security rules.
- Allow all inbound traffic.
- Disable Windows Firewall to attract attacks.
⚠️ This should be done only in an isolated lab environment.
3️⃣ Observing Attacks
Within minutes, automated bots start brute-forcing your VM.
Monitor Event ID 4625 (Failed Login) using Windows Event Viewer:
- Username attempted
- IP address
- Failure reason
4️⃣ Integrating with Sentinel
Use the Azure Monitor Agent to forward logs to Log Analytics Workspace.
Then, connect Sentinel to the workspace for correlation and visualization.
Sample KQL Query:
SecurityEvent
| where EventID == 4625
| project TimeGenerated, Account, IpAddress = tostring(parse_json(AdditionalFields)["IpAddress"])
| sort by TimeGenerated desc
5️⃣ Enriching Data with GeoIP
Import geoip-summarized.csv as a Sentinel Watchlist to map attacks to their geographic origins.
6️⃣ Visualizing Attacks
Create a custom Sentinel Workbook using map.json to generate a live global attack map.
You’ll see where attackers are coming from — in real time.
📊 Results and Insights
Within hours of exposure, the honeypot began receiving:
- Hundreds of failed login attempts.
- Attack sources from over 50 countries.
- Common usernames like
admin, test, and employee.
These logs reflect the global nature of cyber threats and demonstrate how SOCs continuously analyze suspicious activities to safeguard systems.
🧠 Lessons Learned
- Attack simulation is a powerful learning tool.
- Understanding Event ID 4625 is essential for brute-force detection.
- KQL is a must-know language for any SOC analyst.
- Visual dashboards turn complex data into clear stories for decision-makers.
🧩 Next Steps
Future enhancements:
- Integrate Sysmon for deeper telemetry.
- Automate alerts with Logic Apps.
- Extend to multi-cloud monitoring (AWS / GCP).
- Apply AI models or LLMs to summarize log anomalies.
📖 Full Documentation
All setup instructions, queries, and diagrams are available in the public repository:
👉 GitHub: SOC Home Lab from Zero
For a detailed tutorial and reflections, read the Medium article:
👉 Building a SOC Home Lab from Zero — Catching Real Attackers on Azure (replace with your final post URL)
🌐 About the Author
Dr. Hazem A. Elbaz
Assistant Professor of Cybersecurity | SOC Automation Researcher | AI-SOC Founder
Website • LinkedIn • GitHub
Unveiling LLM-SOC-Agent: Revolutionizing Security Operations with AI
In the ever-evolving landscape of cybersecurity, Security Operations Centers (SOCs) are constantly battling an increasing volume and sophistication of threats. The manual burden on analysts is immense, leading to alert fatigue and a struggle to keep pace. This is precisely where the LLM-SOC-Agent project steps in, aiming to transform traditional SOC operations through the power of Large Language Models (LLMs) and intelligent automation.
The LLM-SOC-Agent, an integral part of the broader AI-SOC-Automation initiative, is an open-source endeavor focused on building a multi-agent security framework. This project envisions a future where LLMs act as intelligent assistants, capable of analyzing vast amounts of security data, generating comprehensive insights, and even executing response actions autonomously.
What is LLM-SOC-Agent?
At its core, LLM-SOC-Agent leverages multiple LLM models to analyze and generate security briefs, effectively acting as an AI-driven SOC analyst. The project’s goal is to go beyond simple text generation, enabling LLMs to understand context, reason through security scenarios, and make informed decisions.
Key features and functionalities being developed within LLM-SOC-Agent include:
- Threat Intelligence Analysis: Processing and summarizing threat intelligence data to provide actionable insights on emerging threats.
- Log Analysis: Identifying anomalies and suspicious activities within vast volumes of log data.
- Vulnerability Assessment: Assessing vulnerabilities and summarizing critical exposures.
- Incident Response: Evaluating security incidents and recommending appropriate response actions.
- Overseer Summary: Generating a final, consolidated summary brief based on the outputs of various specialized agents.
The project emphasizes a modular design, allowing for individual agents to handle specific tasks and then collaborate to achieve complex security objectives. This agentic approach is crucial for breaking down intricate security problems into manageable, AI-addressable components.
Diving into the Code Repository
The LLM-SOC-Agent GitHub repository (https://github.com/ai-soc-automation/LLM-SOC-Agent) is where the magic happens. While the specifics of the code structure can evolve, you’ll typically find:
- Agent Modules: Python scripts or directories dedicated to each specialized agent (e.g.,
threat_intel_agent.py, log_analysis_agent.py). These modules likely contain the logic for interacting with LLMs, processing specific data types, and generating targeted outputs.
- Core Orchestration: Files responsible for coordinating the activities of different agents, defining workflows, and managing the overall execution flow. This might involve setting up communication channels between agents and handling the aggregation of their individual analyses.
- Data Handling: Scripts or utilities for data ingestion, preprocessing, and formatting to prepare security data for LLM consumption. The project currently reads
.txt files, but future iterations could involve integration with SIEMs, threat intelligence platforms, and other security tools.
- Configuration: Files to manage API keys, model selections (e.g., local LLMs via Ollama, or cloud-based LLMs like those from Together API), and other project settings.
- Examples and Demos: Sample data and scripts to showcase the agent’s capabilities and provide a starting point for users and contributors.
The development often involves leveraging LLM frameworks to simplify the process of building intelligent agents, managing their memory, decision-making processes, and tool integrations. This allows the project to focus on the security-specific logic rather than reinventing the wheel for LLM interactions.
Contributing to the Future of SOC Automation
The LLM-SOC-Agent project is a fantastic opportunity for anyone passionate about cybersecurity, AI, and open-source development. Contributions are welcomed from individuals with diverse skill sets, including:
- Cybersecurity Analysts/Engineers: Provide domain expertise, define use cases, and validate the accuracy and effectiveness of the AI agents.
- Machine Learning Engineers/Data Scientists: Develop and fine-tune LLM models, implement new anomaly detection algorithms, and improve the overall intelligence of the agents.
- Software Developers: Build new agent modules, enhance existing code, integrate with other security tools, and improve the project’s scalability and robustness.
- Researchers: Explore novel applications of LLMs in cybersecurity, contribute to the theoretical foundations, and propose innovative solutions.
If you’re looking to make a tangible impact on the future of security operations and work with cutting-edge AI technologies, the LLM-SOC-Agent project offers a collaborative environment to learn, build, and innovate. Check out the GitHub repository, explore the existing code, and don’t hesitate to engage with the community to find out how you can contribute!
This is more than just a coding project; it’s about building the next generation of intelligent SOCs, empowering security professionals, and strengthening our defenses against evolving cyber threats.
From Alert Fatigue to Smart Triage: Building an LLM‑Powered SOC Agent
“Security teams drown in tens of thousands of alerts every day. What if a lightweight language model could triage them for you in real time?”
1. The Pain: Alert Overload & MTTR
Security Operations Centers (SOCs) rely on SIEM and SOAR tools, but rule‑based playbooks often miss context, generating floods of false positives. Analysts spend hours weeding out noise, and Mean‑Time‑To‑Respond (MTTR) balloons.
2. Our Idea: Context‑Aware Enrichment With LLMs
We fine‑tuned DistilRoBERTa using LoRA adapters on a blended corpus of _CIC‑IDS 2018_ logs and our synthetic SOC‑Sim stream. The agent:
- Enriches each alert with entity context (IP reputation, MITRE ATT\&CK techniques).
- Clusters alerts that share root cause, shrinking queue length.
- Prioritises by assigning a risk score using chain‑of‑thought prompting.
3. Architecture
┌──────────┐ ┌────────────┐ ┌─────────────┐
│ Logs │──►──▶│ Preprocess │──►──▶│ LLM Enrich │
└──────────┘ └────────────┘ └─────┬───────┘
│ Clusters
▼
┌─────────────┐
│ Prioritise │
└─────┬───────┘
▼
Analyst Dashboard
(A detailed diagram with component icons will be released in the repo’s /docs.)
4. Dataset & Training Pipeline
| Dataset |
Records |
Label Strategy |
Notes |
| CIC‑IDS 2018 |
2.9 M |
Original attack labels |
Cleaned & deduped |
| SOC‑Sim |
1.2 M |
Synthetic MITRE mapping |
Covers phish, ransomware |
Training lasted 4 h on a single RTX 4090. LoRA reduced GPU memory to < 12 GB.
5. Early Results
| Metric |
Rule‑based SOAR |
LLM‑SOC‑Agent |
Δ |
| MTTR (median) |
47 min |
32 min |
↓ 32 % |
| False positives |
18 % |
11 % |
↓ 7 pp |
| Analyst effort (alerts/day) |
1 200 |
820 |
↓ 31 % |
6. What’s Next
- Real‑time Zeek telemetry ingest
- Adversarial robustness testing (IBM ART)
- Feedback loop to fine‑tune on analyst decisions
7. Call to Action
This post is part of my ongoing series on AI‑Driven SOC Automation. Browse the entire journey on the AI‑SOC page.
✍️ Blog Title:
Detecting Network Anomalies with XGBoost and SMOTE: From Cybersecurity Logs to AI Models
🧠 Introduction
As someone transitioning from a cybersecurity background into AI, I recently challenged myself to turn raw network traffic into intelligent insights. The result? A complete machine learning pipeline that detects DoS (Denial-of-Service) attacks with 99.9%+ accuracy and AUC, built on top of real-world IoT traffic.
This project marks a key milestone in my journey — transforming my hands-on experience with logs and network security into a practical AI application.
🔍 What Problem Are We Solving?
Traditional intrusion detection systems (IDS) often fail to detect sophisticated or low-rate DoS attacks. Moreover, the volume of network logs and the class imbalance between normal and malicious traffic make this task even harder.
So I asked myself:
Can we use modern machine learning to detect anomalies directly from network logs?
💾 Dataset: IoTID20-Extended (2024)
We used the IoTID20-Extended dataset, a recent and comprehensive collection of real IoT network traffic. It includes labeled flows representing normal and various attack types — including DoS and DDoS.
📌 Dataset link: Kaggle – IoTID20 Dataset
🛠️ Approach Overview
We designed an end-to-end pipeline with the following stages:
-
Data Preprocessing
- Handle missing values, encode categorical features, scale numerical ones.
-
Feature Selection
- Used
SelectKBest to extract top predictive features.
-
Class Balancing
- Applied
SMOTE to synthetically oversample underrepresented attack traffic.
-
Model Training
- Used
XGBoost, known for performance on tabular datasets.
-
Evaluation
- 10-Fold Cross-Validation using
F1-score and ROC-AUC.
📈 Results
The model achieved:
- ✅ Accuracy: 100%
- ✅ F1 Score: 1.00
- ✅ ROC-AUC: 1.00
These results are exceptional, but they reflect a balanced, clean dataset. In real-world deployments, we’d expect slightly lower but still strong performance.
📊 Confusion Matrix and ROC Curve plots were also generated (see GitHub).
💡 Why This Matters
This project proves that AI can effectively augment traditional network security — not just by detecting anomalies, but by learning from raw or semi-structured data like logs. It’s a step toward AI-driven intrusion detection systems.
As a cybersecurity expert now stepping into AI, this fusion of domains is exactly where I plan to build next.
📂 Try It Yourself
Full project code, notebook, and results are available on GitHub:
🔗 GitHub Repo – Log Anomaly Detection
Includes:
- Notebook with all steps
- Visual results
- Cleaned dataset path
README.md + requirements.txt
🚀 Next Steps
This is just the beginning. My roadmap includes:
- Applying LLMs to raw
.log files
- Integrating SHAP/LIME for model explainability
- Deploying real-time log anomaly detectors
- Combining clustering + classification in hybrid models
👨💻 About Me
I’m Hazem Elbaz, a cybersecurity researcher shifting toward applied AI and intelligent automation in network defense.
🧭 Follow my journey of building real-world AI from the ground up at:
🔗 elbazhazem.github.io
❓Question for You
Have you tried using ML or AI in log analysis or cybersecurity? What tools or datasets worked for you?
👇 Let’s discuss in the comments.
My Roadmap: From Cybersecurity to Applied AI
I’ve spent most of my career in cybersecurity.
In 2024, I decided to pivot — not away from cyber, but toward AI-powered security.
Here’s my roadmap for the transition.
🎯 Step 1: Define a Use Case
I didn’t start with models — I started with a problem:
“How can I make logs easier to understand and analyze?”
That became my first AI-for-cyber project.
📚 Step 2: Learn the Basics of AI/ML
I focused on:
- Python for data and APIs
- Numpy, Pandas
- Scikit-learn for traditional models
- HuggingFace + OpenAI for LLMs
- LangChain for chaining prompts
🔬 Step 3: Build Something Small, Fast
→ Log Analyzer LLM
This was my MVP to apply what I learned.
📈 Step 4: Go Deeper
I’m now:
- Learning clustering + classification
- Experimenting with fine-tuning
- Studying academic papers
- Rebuilding my GitHub profile with applied projects
🧠 Step 5: Share, Reflect, and Publish
This blog is part of that effort.
I’m also:
- Writing a research paper
- Applying for academic/industry roles
- Building a public portfolio of AI + cybersecurity tools
Lessons So Far
- AI is not a destination — it’s a toolkit
- Focus on usefulness, not hype
- You don’t need a PhD to start
🔍 Curious about my work?
Check out my projects or connect on LinkedIn.
Why Cybersecurity Needs AI More Than Ever
Today’s cybersecurity teams are overloaded:
- 📈 Alert fatigue
- ⌛ Shortage of skilled analysts
- 🚨 False positives everywhere
- 🕵️♂️ Sophisticated, evasive threats
In a modern SOC, the real challenge isn’t detection — it’s prioritization and interpretation.
Where AI Can Help
1. Intelligent Summarization
LLMs can:
- Digest 500 lines of logs
- Summarize what happened
- Highlight what matters
2. Threat Contextualization
Instead of just “block port 443”, LLMs can explain:
“This appears to be a reverse shell attempt based on behavior and timing.”
3. Automation of Repetitive Work
- Categorize phishing emails
- Triage alerts
- Recommend mitigation steps
All these can be supported by fine-tuned models or simple LLM prompts.
This is Not a Future Vision — It’s Now
Tools like:
- GPT-4 + Python
- LangChain + SIEM integrations
- Vector databases + threat intel
are already being tested in production environments.
But Beware the Hype
AI ≠ Magic.
- It needs validation
- It requires tuning
- It must be explainable
Final Thought
Cybersecurity needs more than automation.
It needs intelligent augmentation — and that’s where AI shines.
The future analyst is part human, part machine.
🤖 I’m exploring this space deeply in my own projects — see Log Analyzer LLM
🔁 Let’s co-build the next-gen SOC tools.
Lessons from Building My First Log Analyzer with GPT-4
When I started building Log Analyzer LLM, I had one goal:
“Make logs readable, fast.”
Logs are noisy, verbose, and contextless. I wanted an AI assistant that could summarize logs and highlight meaningful events — something traditional SIEMs don’t do well.
Here’s what I learned along the way.
1. Structure Matters More Than You Think
The biggest challenge?
Logs are not standardized.
Some logs are JSON. Others are multiline strings, or worse — key-value chaos.
Solution:
I started by building simple pre-processing steps to:
- Remove noise and timestamps
- Break logs into chunks
- Group them by similarity
2. Prompt Engineering Is Critical
LLMs are powerful — but they need guidance.
💡 I tested several prompts:
- “Summarize these log lines in plain English.”
- “Detect any anomalies in these logs.”
- “Explain what this log segment means.”
Best results came from combining:
- System-level instructions (e.g., “You are a cybersecurity analyst.”)
- Contextual samples (few-shot prompting)
3. Don’t Trust the AI Blindly
LLMs hallucinate. Always.
Sometimes it summarized error logs as “successful operations”.
Other times it guessed at causes.
🚨 Lesson: Always cross-check with known events or ground truth.
4. Python + OpenAI = Fast Prototyping
I used:
openai Python SDK
- Simple
.log file reader
- Streamlit for UI (optional)
Within hours, I had a working proof of concept.
What’s Next?
- Auto-grouping similar events (clustering)
- Hybrid models: rules + LLMs
- Anomaly scoring
- Integration with real-time log streams
Building with GPT-4 taught me one thing:
AI isn’t perfect, but it’s incredibly useful if used right.
If you’re in cybersecurity, you should start experimenting.
🧠 Repo: Log Analyzer LLM
💬 DM me if you’re building something similar — let’s connect.
A personal reflection on why I moved from traditional log analysis tools to LLM-powered log insight engines.
For years, I worked with SIEM platforms, firewalls, EDRs, and an endless stream of logs.
Like most security professionals, I knew the routine:
- Collect logs from FortiGate, Palo Alto, and EDRs like CrowdStrike
- Push everything to a SIEM (Splunk, Qradar, Elastic)
- Spend hours writing correlation rules
- Sift through hundreds of alerts, only to find most were noise
At some point, I asked myself:
What if the system could “understand” what the logs are saying, not just parse them?
The Limits of Traditional SIEMs
Traditional SIEMs are excellent at:
- Collecting and indexing data
- Applying static rules
- Alerting on predefined patterns
But they lack context.
They don’t understand language.
They can’t summarize, explain, or infer like a human analyst.
That’s where Large Language Models (LLMs) come in.
Why LLMs?
LLMs, like GPT-4, bring something new to the table:
✅ They can summarize complex log entries
✅ Extract anomalies or outliers
✅ Translate logs into natural language
✅ Work across diverse sources without custom parsers
It’s not magic — it’s structured prompting, validation, and iteration.
My First Attempt: Log Analyzer LLM
That’s why I built my first prototype:
Log Analyzer with LLM
It:
- Reads raw
.log files
- Uses LLMs to summarize events
- Highlights unusual patterns
- Provides insights in seconds — not hours
It’s still early, but the potential is clear:
“I’m not replacing the SOC analyst — I’m giving them a second brain.”
What’s Next?
- Automating anomaly detection with AI
- Combining clustering + LLMs
- Building a hybrid SOC assistant
- Publishing research on real-world results
If you’ve ever been overwhelmed by thousands of logs and dozens of dashboards —
LLMs might be the tool you’ve been waiting for.
📬 Have thoughts? Want to collaborate?
Find me on LinkedIn or explore my next AI-for-cybersecurity projects.