Lab Log 006 — Zeek_triage.py: Automated Threat Detection

Section 01

Overview

Labs 003 through 005 established a manual triage workflow: DHCP for machine identity, Kerberos for username, HTTP and TLS logs for suspicious activity, conn.log for data volumes. Each step required applying the right filter or command, reading the output, and making a judgment call. The workflow is effective but depends entirely on the analyst remembering the sequence and executing it correctly under time pressure.

This lab automates that workflow into a single Python script — zeek_triage.py — that reads a Zeek log folder and produces a complete color-coded threat report in under one second. The script was developed using AI-assisted development: the problem was defined, the logic was directed, and every function was reviewed and validated against confirmed findings from the previous labs. The output was verified against two known malware families before being considered reliable.

Script Configuration

Language

Python 3.14

Dependencies

None — stdlib only (os, sys, collections)

Input

Any Zeek log directory

Output

Color-coded terminal threat report

Runtime

Under 1 second on 51,000-packet captures

Platform

macOS · Linux · Any Python 3.x system

On AI-assisted development: Every security professional uses AI tooling to accelerate scripting work. The skill is not typing code from memory — it is understanding what to build, directing the logic, reviewing the output for correctness, and validating against real data. This script was built that way. Every function can be explained line by line because the logic was understood before it was written, not after.

Section 02

Script Architecture

The script is organized into six functions plus a main entry point. Each function maps directly to one step of the manual triage workflow from Lab Log 003 and Lab Log 005.

parse_zeek_log(filepath)

Core parser — reads any Zeek log file and returns a list of dictionaries, one per row, keyed by column name. Used by every other function. Skips all # header lines automatically.

get_host_identity(log_dir)

Reads dhcp.log — extracts MAC address, hostname, and assigned IP from the first DHCP record. Returns None safely if the file doesn't exist.

get_username(log_dir)

Reads kerberos.log — finds the first successful AS-REQ and extracts the CNameString username by splitting on the domain separator.

get_suspicious_http(log_dir)

Reads http.log — deduplicates requests by host, filters out known-safe vendors, returns unique suspicious hosts with request counts and methods.

get_suspicious_tls(log_dir)

Reads ssl.log — matches server_name SNI field against a list of high-abuse TLDs. Returns unique suspicious domains with IP and TLD classification.

get_data_volumes(log_dir, ips)

Reads conn.log — sums bytes sent and received for each suspicious IP across all connections. Quantifies the scope of exfiltration in human-readable format.

Key Python Concepts Applied

Concept	Where Used	Why It Matters
with open()	parse_zeek_log	Guarantees file closes even if an error occurs — prevents resource leaks when processing many log files
dict(zip())	parse_zeek_log	Pairs column names with row values to create named dictionaries — enables row["hostname"] instead of row[6]
row.get(key, default)	All functions	Safe dictionary lookup — returns default value instead of crashing if a field is missing from a log row
any() + generator	get_suspicious_http	Checks if any safe vendor keyword appears in the hostname — concise one-line replacement for a nested loop
set()	get_suspicious_tls	Stores only unique values — prevents the same suspicious domain appearing multiple times in the report
defaultdict	get_data_volumes	Automatically creates a new entry for any unseen IP key — eliminates KeyError without manual initialization
str.split()	get_username	Splits "gwyatt/WIN11OFFICE.COM" on the slash to extract just the username — opposite of join()
ANSI escape codes	print_report	Terminal color output with no external libraries — red for confirmed threats, orange for caution, blue for structure

Section 03

Color System — Meaning Over Decoration

Terminal output colors are only useful if they mean something consistent. A color system that uses red for some things and not others, or that changes meaning between sections, forces the analyst to read carefully rather than scan quickly. The script uses three colors with strict single meanings:

Color	Meaning	Applied To
Red	Confirmed threat — act immediately	POST requests, suspicious domain names, compromised username, data exfiltration over 1MB, CRITICAL severity
Orange	Caution — investigate further	Data volumes between 100KB and 1MB, HIGH and MEDIUM severity ratings
Blue	Structure only — not a threat indicator	Section headers exclusively
Dim	Secondary context	URIs, IP addresses, dividers — supporting information that doesn't require immediate attention
No color	Neutral facts	Host IP, MAC address, hostname, GET requests, connection counts

Design principle: An analyst under pressure should be able to open the terminal report and know in under five seconds whether the severity is CRITICAL or LOW based on color alone — without reading a single word. Red anywhere in the output means stop and investigate. No red means continue normal operations. The color system was designed to support that scan-first behavior.

Section 04

Validation — Two Malware Families, Two Different Signatures

The script was validated against both pcaps analyzed in previous labs. The outputs were compared against findings from manual Wireshark analysis (Lab Log 004) and Zeek log analysis (Lab Log 005) to confirm accuracy.

Lumma Stealer — malware-traffic-analysis.net 2026-01-31

python3 zeek_triage.py ~/Desktop/zeek-lumma

============================================================
  ZEEK TRIAGE REPORT
============================================================

[ INFECTED HOST IDENTITY ]
------------------------------------------------------------
  IP Address : 10.1.21.58
  MAC Address: 00:21:5d:c8:0e:f2
  Hostname   : DESKTOP-ES9F3ML
  Username   : gwyatt

[ SUSPICIOUS HTTP REQUESTS ] — 1 unique hosts
------------------------------------------------------------
  GET, POST  whitepepper.su  (6 requests)
             /api/set_agent?id=3BF67EC05320C5729578BE4C0ADF174C&token=...

[ SUSPICIOUS TLS DOMAINS ] — 5 found
------------------------------------------------------------
  media.megafilehub4.lat           104.21.48.156  [.lat]
  whooptm.cyou                     62.72.32.156   [.cyou]
  whitepepper.su                   153.92.1.49    [.su]
  holiday-forever.cc               80.97.160.24   [.cc]
  communicationfirewall-security.cc 104.21.9.36   [.cc]

[ DATA VOLUMES TO SUSPICIOUS IPs ]
------------------------------------------------------------
  153.92.1.49
    Sent       : 2.10 MB
    Received   : 76.9 KB
    Connections: 24
  104.21.48.156
    Sent       : 158.3 KB
    Received   : 43.0 KB
    Connections: 1

[ SEVERITY SUMMARY ]
------------------------------------------------------------
  Suspicious HTTP requests : 1
  Suspicious TLS domains   : 5
  Total data exfiltrated   : 2.27 MB
  Overall severity         : CRITICAL

Easy As 123 C2 Beaconing — malware-traffic-analysis.net 2026-02-28

python3 zeek_triage.py ~/Desktop/zeek-easy123

============================================================
  ZEEK TRIAGE REPORT
============================================================

[ INFECTED HOST IDENTITY ]
------------------------------------------------------------
  IP Address : -
  MAC Address: 00:e0:4c:68:08:00
  Hostname   : brads-MBP
  Username   : brolf

[ SUSPICIOUS HTTP REQUESTS ] — 1 unique hosts
------------------------------------------------------------
  POST  45.131.214.85  (264 requests)
        http://45.131.214.85/fakeurl.htm

[ SUSPICIOUS TLS DOMAINS ] — 0 found
------------------------------------------------------------
  None found

[ DATA VOLUMES TO SUSPICIOUS IPs ]
------------------------------------------------------------
  45.131.214.85
    Sent       : 61.5 KB
    Received   : 532 bytes
    Connections: 1

[ SEVERITY SUMMARY ]
------------------------------------------------------------
  Suspicious HTTP requests : 1
  Suspicious TLS domains   : 0
  Total data exfiltrated   : 62.4 KB
  Overall severity         : MEDIUM

Scoring limitation identified: At this stage the script scored Easy As 123 as MEDIUM based on low data volume — 61.5KB sent. This was later identified as a flaw in the severity logic. A confirmed active C2 connection is CRITICAL regardless of how much data has moved yet. The malware is installed, beaconing, and waiting for commands — an attacker could send a payload at any moment. Lab Log 007 corrects this by introducing KNOWN_BAD_DOMAINS matching, which escalates any confirmed IOC hit to CRITICAL automatically.

Both outputs matched the manually confirmed findings from previous labs. The data volumes, host identity, and domain detections were all accurate. The severity scoring for Easy As 123 was later corrected in Lab Log 007 — a confirmed C2 beacon should always score CRITICAL regardless of exfiltration volume.

Section 05

Comparing the Two Malware Signatures

Running the script against both pcaps in sequence makes the behavioral contrast between malware families immediately visible — not through packet-level analysis but through the summary output alone.

Lumma Stealer — CRITICAL

HTTP hits: 1 host, 6 requests — acted fast, minimal footprint

TLS hits: 5 suspicious domains — moved to encrypted channels immediately

Exfiltrated: 2.27MB — active credential theft, damage already done

Signature: Fast HTTP recon → encrypted exfiltration → gone

Response: Isolate machine, rotate all credentials immediately

Easy As 123 C2 — CRITICAL (corrected in Lab 007)

HTTP hits: 1 host, 264 requests — persistent beaconing over 4+ hours

TLS hits: 0 — never bothered encrypting

Exfiltrated: 61.5KB — check-ins only, waiting for commands

Signature: Regular POST interval → POLL payload → C2 waiting

Response: Isolate machine, block C2 IP, investigate initial infection vector

Severity scoring flaw: The initial script scored Easy As 123 as MEDIUM because data volume was low. This was wrong. Severity is not about how much damage has occurred — it is about whether a confirmed threat is present. A machine with an active C2 connection is CRITICAL regardless of exfiltration volume. The attacker is in. Lab Log 007 corrects this by adding KNOWN_BAD_DOMAINS detection, which scores any confirmed IOC match as CRITICAL automatically — separating "damage done" from "threat confirmed."

Section 06

Usage

The script requires Python 3 and no external libraries. It runs on any system where Zeek logs are available — analyst workstation, jump server, or directly on the capture host.

Running the Script

# Step 1 — Generate Zeek logs from a pcap
mkdir zeek-output && cd zeek-output
zeek -r /path/to/capture.pcap

# Step 2 — Run triage script against the log folder
python3 zeek_triage.py /path/to/zeek-output

# Example against Lumma Stealer logs
python3 zeek_triage.py ~/Desktop/zeek-lumma

# Example against Easy As 123 logs
python3 zeek_triage.py ~/Desktop/zeek-easy123

Configuration — Customizing Detection

Two lists at the top of the script control detection behavior and can be edited without touching any function logic:

# High-abuse TLDs — connections to these domains are flagged
SUSPICIOUS_TLDS = [".su", ".ru", ".cc", ".xyz", ".top",
                   ".pw", ".cyou", ".lat"]

# Known-safe vendors — connections to these hosts are filtered out
SAFE_HOSTS = ["microsoft", "google", "akamai", "cloudflare",
              "adobe", "digicert", "windowsupdate", "msftconnect",
              "office", "bing", "gstatic", "msn", "azure"]

Adding a new TLD to SUSPICIOUS_TLDS immediately flags all matching domains in any future analysis. Adding a vendor to SAFE_HOSTS removes false positives without touching any function logic.

Section 07

Planned Enhancements

The current script covers the core triage workflow. Two enhancements are planned for Lab Log 007:

Enhancement 01

KNOWN_BAD_DOMAINS List + Dynamic DNS Detection

A third configuration list — KNOWN_BAD_DOMAINS — will flag specific IOCs from previously analyzed malware alongside dynamic DNS providers (no-ip.com, duckdns.org, ddns.net, and others) that are almost exclusively used by malware infrastructure. Known-bad matches will be scored higher than TLD-based matches and marked distinctly in the report output.

Enhancement 02

Mac Application Wrapper — GUI Drag-and-Drop Interface

The script will be wrapped in a lightweight GUI using Python's built-in tkinter library, allowing a Zeek log folder to be dragged and dropped into a window rather than typed as a command line argument. The goal is a self-contained Mac application that an analyst can use without any terminal knowledge — making the triage workflow accessible as a tool rather than a script.

Section 08

NIST SP 800-171 Control Mapping

Applicable Controls

3.3.1

Audit logging. The script automates the extraction and analysis of network-layer audit logs generated by Zeek. Running it produces a structured finding record that can be retained as part of an incident audit trail.

3.14.6

Monitor for malicious code. TLD-based domain detection and safe-host filtering implement an automated monitoring layer that would fire on both malware families tested — without requiring the analyst to know what to look for in advance.

3.13.1

Boundary protection. Data volume analysis quantifies outbound transfers to suspicious IPs — providing evidence for egress filtering decisions and confirming whether exfiltration has occurred before incident escalation.

3.3.2

User activity traceability. Automated Kerberos extraction attributes all suspicious network activity to a specific user and machine in every report — producing the accountability chain required by this control without manual analysis steps.

Section 09

Lessons Learned

Automation Does Not Replace Understanding — It Requires It

The script works correctly because the underlying workflow was understood first. The DHCP → Kerberos → HTTP → TLS → conn.log sequence, the column offset in Zeek logs, the meaning of AS-REQ vs TGS, why SNI is visible in encrypted traffic — all of that knowledge from Lab Logs 003 through 005 went directly into the script logic. An analyst who skipped those labs and ran this script would not know whether to trust the output or how to investigate a finding it surfaced.

Deduplication Changes What the Report Means

The first version of get_suspicious_http printed every individual HTTP request — 264 identical POST lines for Easy As 123. That output is technically accurate but operationally useless. The revised version deduplicates by host and shows a count instead. The report went from 264 lines that required scrolling to 1 line that conveyed the same information instantly. Good security tooling presents findings at the right level of abstraction for the decision that needs to be made.

Severity Scoring Must Be Tied to Behavior, Not Just Counts

Easy As 123 produced 264 suspicious HTTP requests. Lumma Stealer produced 6. A count-based severity score would rate Easy As 123 higher — but Lumma caused dramatically more damage. The script scores severity on data volume rather than request count because exfiltration volume is what determines response urgency. A machine that POSTed 2MB to a Russian domain needs immediate credential rotation regardless of how few requests it took.

Next Lab

Lab Log 007 will add KNOWN_BAD_DOMAINS detection with dynamic DNS flagging and build the Mac application wrapper — a drag-and-drop GUI that runs the triage script without any terminal interaction. The goal is a self-contained tool that an analyst at any skill level can use on any Zeek log folder.

Zeek_triage.py — Automated Threat Detection with Python

Overview

Script Architecture

Key Python Concepts Applied

Color System — Meaning Over Decoration

Validation — Two Malware Families, Two Different Signatures

Lumma Stealer — malware-traffic-analysis.net 2026-01-31

Easy As 123 C2 Beaconing — malware-traffic-analysis.net 2026-02-28

Comparing the Two Malware Signatures

Usage

Configuration — Customizing Detection

Planned Enhancements

NIST SP 800-171 Control Mapping

Lessons Learned

Automation Does Not Replace Understanding — It Requires It

Deduplication Changes What the Report Means

Severity Scoring Must Be Tied to Behavior, Not Just Counts

Next Lab