Machine learning to deep learning: The new frontline in global phishing defense
According to the authors, this evolution marks a clear transition from traditional rule-based security toward data-driven, adaptive AI systems capable of identifying new or “zero-day” phishing threats that would bypass older filters .
The growing sophistication of phishing attacks has triggered an arms race in cybersecurity, with artificial intelligence (AI) now emerging as a key defense mechanism. A new bibliometric review maps nearly two decades of scientific progress in AI-driven phishing detection.
Published in Frontiers in Artificial Intelligence and titled "AI in Phishing Detection: A Bibliometric Review," the study analyzes 1,096 documents from the Web of Science database spanning 2005 to 2025. It reveals a steep growth in publications, especially from 2016 onward, as researchers turned to machine learning (ML), deep learning (DL), and natural-language processing (NLP) to fight one of the most persistent forms of cybercrime.
The evolution of AI-based phishing detection
The review finds that phishing research surged after 2016, with 2024 standing out as the most productive year to date. Early studies relied mainly on machine-learning classifiers such as Support Vector Machines, Random Forest, and Decision Trees to flag fraudulent websites and emails based on features extracted from URLs, HTML structure, and metadata.
From 2018 onward, the field began a technological shift toward deep learning. Algorithms like Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), and hybrid stacking models improved scalability and accuracy by learning patterns automatically rather than depending on hand-crafted features. Studies cited in the review report detection accuracies approaching 99 percent when CNN models were trained on large datasets such as PhishTank and Alexa rankings.
Feature engineering also became a decisive factor. Research introduced frameworks such as Hybrid Ensemble Feature Selection (HEFS) and CDF-g algorithms to pinpoint the most predictive variables while reducing false alarms and computational load.
According to the authors, this evolution marks a clear transition from traditional rule-based security toward data-driven, adaptive AI systems capable of identifying new or "zero-day" phishing threats that would bypass older filters .
Mapping the field: collaboration, influence, and emerging themes
Using the Biblioshiny (Bibliometrix package) and VOSviewer platforms, the authors conducted both performance analysis and science mapping. Their results highlight a 27.5 percent annual growth rate in publications and the participation of over 3,300 researchers from 644 sources worldwide.
Only about 3.5 percent of the papers were single-authored, reflecting the discipline's growing complexity and need for team-based approaches. The analysis shows a small subset of highly cited papers, roughly 10 percent of the total, accounting for more than half of all citations in the field .
Keyword co-occurrence analysis revealed five major thematic clusters shaping AI-driven phishing research: machine learning models for URL analysis, deep learning architectures for content classification, behavioral analytics and NLP for contextual understanding, Big Data integration for real-time detection, and human-factor awareness initiatives. These clusters illustrate how technical innovation and social engineering studies are slowly converging within cybersecurity.
The authors note that while AI algorithms excel at identifying malicious patterns, user behavior remains a critical weak link. Phishing success often depends on human curiosity and compliance, yet few studies combine automated detection with user-centric training models. Future work, they argue, must bridge this gap through integrated defenses that teach employees how to recognize and report suspicious content while AI systems analyze it in real time .
From research to real-world cybersecurity
The authors document how modern AI models are being integrated into commercial security platforms such as Microsoft Defender, Google Workspace, Barracuda, and Abnormal Security. These systems use behavioral analytics, computer vision, and natural-language processing to detect suspicious URLs and emails before they reach users' inboxes.
The authors also propose that AI-based phishing detection be embedded into Security Information and Event Management (SIEM) platforms, endpoint protection software, secure email gateways, and cloud-based defense systems to create layered cyber resilience . This would allow continuous monitoring and automated incident response without relying solely on human intervention.
However, they warn that the rapid adoption of AI tools raises new ethical and regulatory challenges. Explainable AI (XAI) has become a priority for ensuring that automated decisions can be audited and understood by cybersecurity professionals. Opaque "black-box" models may offer accuracy but fail to provide insight into why a particular email was flagged as phishing. As the European Union's Artificial Intelligence Act tightens rules on transparency and accountability, companies deploying AI for critical infrastructure, such as banking and healthcare, must comply with strict evaluation and certification requirements .
The study also notes that scalability and data interoperability remain obstacles. Most models depend on standardized datasets like PhishTank or UCI, which may not capture new types of attacks targeting voice assistants or IoT devices. Future research must address these limitations through cross-institutional data sharing and privacy-preserving AI techniques.
- FIRST PUBLISHED IN:
- Devdiscourse