PhishGuard — Alex Philip

02 · Detection Engine

How It
Works

PhishGuard runs a seven-stage detection pipeline on each email. Structural indicators are weighted more heavily than keyword matches — they require deliberate construction and are far harder to accidentally trigger in legitimate email.

Parsing

Email Header Extraction

Raw email source is parsed in the browser. Key fields are extracted: From, To, Subject, Reply-To, Date, Received headers, and message body. The Received chain contains the ground truth about an email's origin — often very different from what the From field displays.

Structural

Display Name Alarm Detection

The sender's display name is checked against psychological alarm vocabulary: hacked, suspended, urgent, warning, bitcoin, payment required. Legitimate organizations never use panic language in display names — this is a high-confidence structural signal weighted at +15 per finding.

Structural

Brand Impersonation Check

The display name is cross-referenced against major brands — PayPal, Amazon, Google, Apple, Microsoft. If a brand name appears in the display name but the sending domain does not match the official domain, the discrepancy is flagged as spoofing.

Structural

Reply-To Mismatch Detection

A common phishing technique sets a legitimate-looking From address while routing replies to an attacker-controlled address. Domain mismatch between From and Reply-To is a strong structural indicator (+15 points). Example: From: support@paypal.com / Reply-To: attacker@gmail.com.

Forensic

IP Address Extraction

Regex extracts IPv4 addresses from Received headers with two validation layers — rejecting octets with leading zeros (preventing false positives from date strings like 04.04.01.02) and deduplicating across all routing hops. Private IPs in public routing chains indicate possible header forgery.

Content

Five-Category Keyword Scanner

The subject and body are scanned across five psychological manipulation categories: Urgency, Financial Lure, Threat, Credential Harvesting, and Extortion/Sextortion. Bitcoin wallet addresses are detected via regex. The extortion category auto-escalates any email with 2+ indicators to a minimum score of 75.

Content

URL Analysis

URLs extracted from the email body are checked for: suspicious TLDs (.xyz, .ru, .tk, .ml), URL shorteners hiding the true destination, IP addresses used as domains, excessive subdomains, redirect parameters, and brand name spoofing within non-official domains.

Scoring Breakdown

Alarm words in display name+15 each

Reply-To domain mismatch+15

Brand impersonation+15

Private IP in routing+8 each

Extortion keywords (2+)Auto ≥75

Credential harvesting+10/cat

Suspicious URL flags+8/flag

Risk Verdict

70 – 100HIGH RISK

40 – 69MEDIUM RISK

15 – 39LOW RISK

0 – 14CLEAN

03 · Real-World Test

Tested on a
Real Phishing Email

The tool was tested against a real sextortion/extortion scam received in a personal inbox — a sophisticated attack using psychological panic tactics and a Bitcoin payment demand.

📧 Information about your online security.eml

/ 100

HIGH RISK

From

"You've been HACKED" <kfixc@kawachi.zaq.ne.jp>

Subject

Information about your online security

IP Found

222.227.81.164 — Public

Reply-To

Not found

Indicators Detected

Alarming display name Extortion keywords Bitcoin wallet address recorded you your webcam hacking group 48 hours payment immediately

Outcome

The extortion category auto-escalated the score when two or more extortion-specific indicators were found, correctly classifying this sextortion scam as HIGH RISK. Without the dedicated extortion category, this email previously scored only 8% — a critical detection gap now resolved.

04 · Feature Set

Full
Capabilities

Email Header Forensics

Extracts and analyzes From, To, Subject, Reply-To, Date, and the full Received routing chain from .eml files or pasted raw source.

SMS Smishing Analyzer

Dedicated analyzer for text message phishing with 5 detection categories, URL scanning, phone UI preview, and actionable advice.

Extortion Detection

Dedicated sextortion/extortion category with auto-escalation scoring. Bitcoin wallet address detection via regex pattern matching.

IP Forensics

Extracts IPs from routing headers with false positive filtering. Detects private IPs in public routing chains — a strong indicator of header forgery.

URL Scanner

Checks for suspicious TLDs, URL shorteners, IP-based domains, excessive subdomains, redirect parameters, and brand name spoofing.

Investigation Report

Generates a downloadable HTML investigation report with full findings, scoring breakdown, and risk verdict. Print-to-PDF supported.

Full Mobile Support

Bottom navigation bar, slide-up drawers, FAB button, and Email/SMS mode toggle — designed for full mobile usability on any device.

Keyword Detection

Five-category phishing language scanner: Urgency, Financial Lure, Threat, Credential Harvesting, and Extortion across both email and SMS.

No Backend Required

Runs entirely client-side in the browser. No server, no API, no data uploaded anywhere. Hosted free on GitHub Pages.

06 · Engineering

Challenges
Solved

Sextortion Detection Gap

Standard keyword lists failed to detect sextortion emails, which scored only 8% despite being clear threats. Solved by adding a dedicated extortion category with targeted vocabulary and an auto-escalation rule that guarantees ≥75 score when two or more extortion-specific indicators are found.

False Positive IP Extraction

Email Date headers (e.g. Sat, 4 Apr 2026 01:02:27) contain sequences matching IPv4 patterns. Initial regex extracted 04.04.01.02 as a valid IP. Solved by rejecting octets with leading zeros and deduplicating across the full received header chain.

Mobile Accessibility

The original tool required Google Colab — unusable on mobile. Rebuilt as a pure HTML/CSS/JS application with a dedicated mobile layout: bottom navigation bar, slide-up drawers for input, and a FAB button. Email/SMS mode toggle accessible from both sidebar and mobile drawer.

Scoring Calibration

Simple keyword matching produces too many false positives — a legitimate urgent business email would score high. Solved by weighting structural indicators (display name, Reply-To, private IPs) more heavily than keyword matches, as these require deliberate construction.

Base64 Browser Parser Crash

An attempt to embed the logo as a 41,000-character base64 JPEG caused Safari's HTML parser to choke, rendering the entire script block as visible page text. Solved by replacing all base64 assets with inline SVGs — smaller, faster, and parser-safe across all browsers.

07 · Learnings

What I
Learned

How email headers expose the true origin of a message, independent of what the From field displays

How to write robust regex patterns handling real-world edge cases like date-string false positives

How weighted scoring systems balance sensitivity and specificity in automated threat detection

How sextortion scams differ structurally from standard phishing, requiring dedicated detection categories

How to design mobile-first web applications with slide-up drawers, FAB buttons, and bottom navigation

How browser HTML parsers fail under malformed inline data, and how to build defensively with SVGs

How to take a Python/Colab prototype and rebuild it as a zero-dependency production web application

How smishing (SMS phishing) differs from email phishing in structure, vocabulary, and detection approach

Problem
Statement

How It
Works

Tested on a
Real Phishing Email

Full
Capabilities

Technology

Challenges
Solved

What I
Learned

See It In Action

ProblemStatement

How ItWorks

Tested on aReal Phishing Email

FullCapabilities

Technology

ChallengesSolved

What ILearned

See It In Action

Problem
Statement

How It
Works

Tested on a
Real Phishing Email

Full
Capabilities

Challenges
Solved

What I
Learned