As a senior cybersecurity researcher at NIST's Center for AI Standards and Innovation (CAISI), I lead a research team tracking the cyber offense capabilities of AI systems through pre- and post-deployment model evaluations. I also support a team attacking AI systems to discover how these systems could be exploited by nefarious actors. Through close partnerships with leading AI labs and other U.S. government agencies, my work advances our understanding of AI's national security implications and strengthens the resilience of this critical technology.
Previously, I spent a decade at MIT Lincoln Laboratory in the Cyber System Assessments Group, where I led a research team at the intersection of dynamic program analysis, firmware security, and vulnerability discovery. In that role, I helped define the field of firmware rehosting, developed open-source tools, and applied them to evaluate the security of critical systems. Along the way, I developed award-winning tools and techniques for evaluating vulnerability discovery tools, reverse engineering, and software exploitation. My background in systems security and cyber assessments now shapes how I approach AI.
I earned my PhD in Computer Science from Northeastern University and BS from Rensselaer Polytechnic Institute, where I was an active member of RPISEC and its cyber capture the flag (CTF) team. I'm also passionate about cybersecurity education, having developed courses for universities, government agencies, and private companies. The materials from my System Security with Dynamic Program Analysis course are publicly available.
Dynamically constructing analysis environments for firmware code on demand. PENGUIN's novel approach enables dynamic security testing of firmware at scale, without any specialized hardware.
Problem: Critical embedded systems run firmware code that is difficult to analyze for security flaws because it is tightly coupled with specific hardware. Initial attempts at "firmware rehosting" built a single environment for analyzing firmware code that would only work for a small subset of devices.
Solution: PENGUIN provides a configurable interface for constructing a firmware rehosting environment paired with a suite of analyses to dynamically construct and refine this environment through dynamic analysis. PENGUIN enables large-scale security analysis of firmware code by iteratively building high-quality rehosting environments, sufficient for complex security analyses.
Tech Stack: Multi-repo CI/CD with multi-stage Docker builds on Ubuntu 22.04; Rust and Python packages; custom linux kernels, emulators, and guest utilities; cross-compilation toolchains and multi-architecture support (ARM, MIPS, PowerPC, RISC-V); integrated firmware analysis and rehosting libraries.
Publication
PENGUIN: Target-Centric Firmware Rehosting
Andrew Fasano, Zachary Estrada, Luke Craig, Ben Levy, Jordan McLeod, Jacques Becker, Elysia Witham, Cole DiLorenzo, Caden Kline, Ali Bobi, Dinko Dermendzhiev, Tim Leek, William (Wil) Robertson
Injecting new logic into a running virtual machine from the hypervisor. Hypervisor Dissociative Execution (HyDE) enables cloud providers to offer new monitoring, management, and security services for any Linux VM without installing specialized software.
Problem: Cloud providers give end-users hardware for running virtual machines (VMs), but then have limited visibility into what is happening inside those VMs. Existing approaches ask users to install new software within their VM or use brittle virtual machine introspection techniques that are difficult to scale.
Solution: HyDE enables cloud providers to inject new logic into a running VM by leveraging the stable syscall ABI as a programming interface from the hypervisor. With HyDE, cloud providers can offer new monitoring, management, and security services for any Linux VM without requiring users to install specialized software. For example, HyDE can be used to generate dynamic software bills of materials (SBOMs), reset passwords, or add out-of-band two-factor authentication (2FA) on root logins. When built atop HyDE, these services are compatible with a wide range of Linux kernels and distributions without requiring any changes to the guest VM.
Tech Stack: QEMU/KVM-based hypervisor with customizations in both user- and kernel-space; C++ software development kit to enable users to create their own HyDE programs as asynchronous coroutines; C++ runtime for tracking guest state and managing active HyDE programs.
Publication
HyDE: Hypervisor Dissociative Execution
Andrew Fasano, Zak Estrada, Timothy Leek, William Robertson
LAVA automatically injects realistic vulnerabilities into software for evaluation; The Rode0day competition used these to evaluate tools and approaches to vulnerability discovery.
Problem: Evaluating vulnerability discovery tools is difficult due to a lack of ground truth. Real-world vulnerabilities are rare, and existing benchmarks are small, outdated, or unrealistic. Once a benchmark is published, future work can overfit to it, making it less useful for evaluating new techniques.
Solution: LAVA (Large-scale Automated Vulnerability Addition) uses dynamic taint tracking, program rewriting, and automated hypothesis testing to create a continuous stream of realistic vulnerabilities in real-world software. Building on LAVA, the Rode0day competition challenged participants to find previously-unseen vulnerabilities generated by LAVA, providing a reproducible platform for evaluating vulnerability discovery tools. LAVA was awarded an R&D 100 Award in 2020 for its impact on the field of vulnerability discovery evaluation.
Tech Stack: Dynamic taint analysis tracking with PANDA.re paired with LLVM-IR based source level transformations to map dynamic taint information back to source code locations. Flask-based web application with extensive sandboxing to manage user submissions; distributed fuzzing campaigns across a Slurm-based HPC cluster using Singularity containers;
Publications
Evaluating Synthetic Bugs
Joshua Bundt, Andrew Fasano, Brendan Dolan-Gavitt, William Robertson, Timothy Leek
ACM AsiaCCS 2021
The Rode0day to Less-Buggy Programs
Andrew Fasano, Tim Leek, Brendan Dolan-Gavitt, Josh Bundt
Emulator designed for dynamic program analysis. PANDA.re provides a platform for whole-system dynamic analysis with deterministic record/replay, taint tracking, and a rich C/C++/Python plugin ecosystem.
Problem: Whole-system dynamic program analysis is a powerful technique for understanding how complex software interacts with the underlying operating system and other programs. But building this type of analysis requires a platform that provides low-level visibility into system events, an architecture that supports extensibility, and the ability to compose multiple analyses together
Solution: PANDA.re extends the QEMU emulator to provide a platform for whole-system dynamic analysis with low-level callbacks, deterministic record/replay, and a rich plugin ecosystem. PANDA plugins implement complex virtual machine introspection capabilities to recover high-level abstractions from low-level events, track data flow through memory and registers, and analyze system behavior in real time. Through its Python interface, PANDA allows researchers to simultaneously orchestrate the guest system and implement complex analyses.
Tech Stack: Large (over 1.5M lines of code) C/C++ codebase customizing QEMU to add support for tracking low-level hardware events, plugin system, and Python3 bindings; C/C++/Python/Rust plugins for dynamic taint analysis, syscall tracing, OS introspection, and more; GitHub actions for CI/CD releases to DockerHub and PyPi.
Contribution note: I served as the maintainer of PANDA.re from 2020-2024, during which time I expanded the platform to support new guest architectures, new analysis capabilities, and upgraded to a modern CI/CD pipeline. I expanded core plugins including the Linux virtual machine introspection system (VMI) to support additional architectures. I developed and delivered a number of training courses on PANDA.re, and helped onboard new users to the platform while growing the open-source community around the project. PANDA.re is developed by a large team of contributors and this work would not have been possible without their significant efforts!
Publication
PyPANDA: Taming the Pandamonium of Whole System Dynamic Analysis
Luke Craig, Andrew Fasano, Tiemoko Ballo, Timothy Leek, Brendan Dolan-Gavitt, William Robertson
Problem: Firmware rehosting is a growing field of research, but prior advances in the space have been ancillary to enabling other research tasks such a fuzzing web applications. As a result, the field lacks a common vocabulary, understanding of the design space, and ability to compare different approaches.
Solution: In partnership with an international team of researchers, I led development of a systematization of knowledge (SoK) paper that defines the design space for firmware rehosting research. This work establishes common terminology, evaluation criteria, and a taxonomy that subsequent work (including PENGUIN) builds on.
Publication
SoK: Enabling Security Analyses of Embedded Systems via Rehosting
Andrew Fasano, Tiemoko Ballo, Marius Muench, Tim Leek, Alexander Bulekov, Brendan Dolan-Gavitt, Manuel Egele, Aurelien Francillon, Long Lu, Nick Gregory, Davide Balzarotti, William Robertson
Problem: When software fuzzers try generating inputs to reach new parts of a program that was previously untested, they often get stuck on specific conditional statements, unable to generate the right input to move past them. Human analysts can often identify these "roadblocks" and guide the fuzzer past them, but it is difficult to know where to focus their efforts.
Solution: We developed compartment analysis, a novel static analysis that identifies large, under-covered regions of code that are likely to contain new functionality. By focusing human effort on these compartments, we can guide fuzzers to explore new parts of the program more effectively. Our experiments show that this approach leads to significant coverage gains and helps discover new vulnerabilities.
Publication
Homo in Machina: Human Guidance for Fuzzing
Josh Bundt, Andrew Fasano, Brendan Dolan-Gavitt, William Robertson, Timothy Leek
2023 IEEE Conference on Software Testing, Verification and Validation
Problem: Cybersecurity competitions (Capture the Flag, or CTF) provide a fun and engaging way to practice and improve cybersecurity skills. However, building high-quality challenges that are both educational and engaging is difficult and time-consuming.
Solution: Outside of my professional responsibilities, I have been an active participant in the cybersecurity competition community since 2013. I have mentored teams, build challenges, and competed in numerous competitions. At separate times I have both competed in and built challenges for DEF CON CTF, the most prestigious CTF competition in the world.
AI Targets, AI Attackers: A Case Study on AI Agent Exploit Chains
2025.08
Redacted (US Government Conference)
A case study examining sophisticated exploit chains that combine traditional software vulnerabilities with AI-specific attack vectors.
Abstract not publicly available.
A Reverse Engineer's Guide to Mechanistic Interpretability
2024.08
DEF CON AIxCC
Introducing mechanistic interpretability to the reverse engineering community and exploring how traditional RE techniques apply to understanding neural networks.
While the world buzzes about AI-augmented reverse engineering, what about turning the tables and reverse engineering AI itself? As artificial intelligence systems grow increasingly complex and pervasive, decoding their inner workings has become not just a fun challenge, but a critical necessity. This talk introduces the emerging field of mechanistic interpretability to the reverse engineering community, revealing how the frontier of AI research is reinventing wheels long familiar to RE experts.
We'll explore how traditional reverse engineering techniques are finding new life in dissecting neural networks, and why the RE community's hard-earned wisdom is more relevant than ever in the age of AI. The presentation will demystify key concepts in mechanistic interpretability such as features, circuits, and superposition, mapping them onto familiar RE paradigms.
Attendees will gain insights into:
The parallels between reverse engineering software and decoding AI systems
Current challenges in mechanistic interpretability
The golden opportunities for reverse engineers to contribute to this critical field and potentially reshape the future of AI safety
This talk aims to spark a cross-pollination between reverse engineering and AI research communities. Whether you're a seasoned reverse engineer itching for a new challenge, or an AI researcher seeking fresh perspectives, prepare to view artificial intelligence through a new lens.
The Trials, Tribulations, and Triumphs of Whole System Dynamic Analysis
2023.10
New York University
Lessons from a decade in the trenches developing and utilizing PANDA.re for dynamic program analysis across programs, operating systems, and embedded devices.
Computer security professionals are always asking the age-old question, "What can this program do?" Often that question is asked using fancy static analysis tools and techniques to examine different paths through a program. But wouldn't it be easier to just run a program and see what it did? In an idealized CS 101 world, this would be straightforward, as programs would abide by our imagined abstraction layers.
In reality, our computers are full of deceptions, from memory aliasing to speculative execution, concealing complexity from developers at every corner. This talk will dive into the deep end of dynamic program analysis (DPA), exploring the tumultuous landscape filled with both triumphs and pitfalls.
Drawing on lessons learned from a decade of developing and utilizing PANDA.re, an open-source DPA platform, the presentation will demonstrate how DPA can be applied to individual programs, entire operating systems, and even embedded systems like wireless routers and IoT devices. Andrew will introduce existing analysis tools, demonstrate how approachable (or challenging) crafting your own analyses can be, and showcase real-world applications and missteps.
Horror stories of failure will be paired with invaluable lessons we learned along the way. Andrew will conclude with a brief overview of his experience working on the research staff at MIT Lincoln Laboratory and opportunities for both careers and collaborations there.
The LAVA has Hardened! Building a Better Bug Corpora to Evaluate Bug-Finders
2019.10
AvengerCon
Presenting advances in LAVA's automated vulnerability injection system for creating realistic bug datasets to evaluate security tools.
How good is your cutting-edge fuzzer prototype? Can it compete against an elite team of reserve engineers? How about an advanced static analysis system? We released the LAVA-M corpus in 2016 in an attempt to quantify how many bugs these proven techniques fail to find. Since the initial corpus was released with four programs and thousands of synthetic bugs, we have seen huge improvements in bug-finding technology, and most modern tools can find nearly all the bugs in the corpus. To provide new corpora to evaluate modern bug-finding systems, we have open-sourced an improved, more challenging version of the bug-injection framework. With this new system, we launched Rode0day, an online, bug-finding competition where competitors have one month to identify as many bugs within a program as possible.
This talk will focus on our LAVA bug-injection framework and how it can be used to generate buggy programs on-demand for a range of evaluation needs. Furthermore, we will discuss Rode0day and insights from the first year of competition.
Rode0day: A Year of Bug-Finding Evaluations
2019.08
USENIX WOOT
Analysis of data from the first year of Rode0day competitions, examining what makes bugs hard to find and how to improve bug-finding tools.
Why are some bugs so hard to find? Why are some bug-finding tools more effective than others? How can we improve bug-finding tools? In May 2018, we launched Rode0day, a monthly bug-finding competition designed to answer these questions.
In our first year of competitions, we injected thousands of synthetic bugs into more than 50 programs, evaluated 35 bug-finders as they searched for bugs, and collected information on when teams found bugs as well as properties of the bugs themselves.
In this talk we will present our analysis of this data and use it identify strengths and weaknesses of tools, discuss what properties of an injected bug make it easy or hard, and suggest ways of improving bug-finders.
Rode0day: Searching for Truth with a Bug-Finding Competition
2018.08
USENIX WOOT
Behind-the-scenes look at launching Rode0day, a continuous bug-finding competition using automated vulnerability injection.
This summer, we launched Rode0day, a continuous bug-finding competition designed to improved understanding of bugs and bug-finding. Using automated vulnerability injection, we add hundreds of new bugs to programs every month and challenge competitors to find as many bugs as they can. Competitors are awarded points for generating inputs that trigger a new bug and cause the program to crash. At the end of each competition, we release an answer-key containing crashing inputs for each injected bug.
We want Rode0day to help push the state-of-the-art in bug-finding forward by enabling analysis based on ground-truth. We hope that this competition provides a fair and concrete evaluation to study the relative strengths and weaknesses of bug-finding systems.
In this talk, I'll provide a behind-the-scenes look at how we run these competition with a focus on challenge generation and then share some preliminary analyses of data from our first two competitions.
LAVA was awarded an R&D100 award for its impact advancing the state of the art in vulnerability discovery and enabling rigorous, large-scale evaluation of automated security tools.