Using Generative AI to Automate Application Security Reviews of Pull Requests
What Is a Pull Request?
A pull request (PR) is a key part of modern software development workflows, especially in Git-based version control systems like GitHub or GitLab. When a developer makes a code change on a separate branch, they create a pull request to propose merging those changes into the main codebase. PRs allow teams to collaborate on code review, catch bugs, and enforce security and quality standards before changes go live.
In security-sensitive environments PRs are a critical checkpoint, not just for correctness but for catching security flaws before they reach production.
Why Automate PR Security Reviews in an Enterprise?
In a fast-moving enterprise DevOps environment, hundreds or even thousands of PRs may be opened each month. Manual security reviews for every change simply don’t scale. While security teams want to gate risky code from being merged, they also don’t want to be a bottleneck for delivery.
This creates a familiar tension:
Developers want velocity.
Security wants visibility and control.
Automation is the answer.
By using GenAI to analyze PRs, security teams can:
Automatically identify suspicious or high-risk changes.
Prioritize manual reviews on impactful code.
Reduce human error and blind spots.
Keep up with the speed of development without sacrificing safety.
Introducing the Proof-of-Concept
Here is a proof-of-concept Python script that integrates with the GitHub API and OpenAI’s GPT-4 model. It performs the following actions:
Fetches the raw diff (code changes) from a pull request.
Sends the diff to GPT-4 for summarization and security analysis.
Returns a structured response, including:
Summary of changes
Identification of potential vulnerabilities
Suggested severity/CVSS ratings
Actionable recommendations
A decision prompt: “Contact Security Team” vs. “Build is OK to merge”
It’s a lightweight, flexible tool that can be plugged into a CI/CD pipeline.
Prompt Breakdown & Engineering Techniques
Here’s the core prompt:
This prompt uses several prompt engineering techniques:
Role assignment: “You are a security-focused software engineer” gives the model a specific lens through which to interpret the code.
Structured instructions: Numbered tasks guide the model through analysis, reducing hallucinations and missed steps.
Conditional logic: Asking for a decision between “Contact Security Team” or “OK to merge” forces the model to synthesize and judge risk—useful in CI pipelines.
Real-World Example: Analyzing OWASP Juice Shop PR #1695
To demonstrate this proof of concept in action, we tested it on a real-world pull request from OWASP Juice Shop, a deliberately vulnerable application designed for training and testing application security skills.
PR #1695 introduces a “dummy vulnerability,” making it an ideal test case for automated PR analysis.
Here’s what happened:
The script fetched the raw diff for the PR using the GitHub API.
The diff was passed to GPT-4 along with our custom security prompt.
The AI-generated output identified the high-level changes and flagged the insecure code.
AI Output:
Security Review Workflow in CI/CD
Here’s how this system would work from the security team’s point of view in an enterprise CI/CD pipeline:
Developer opens a PR → Triggers pipeline.
CI job fetches the PR diff using GitHub API.
Script sends the diff to OpenAI for analysis.
AI response is parsed:
If “Build is OK to merge” → CI continues.
If “Contact Security Team” → CI blocks merge and notifies AppSec team.
Security engineer reviews flagged PRs, leveraging the AI summary as a triage aid.
Approved PRs proceed, while flagged ones get additional review.
This workflow allows:
100% PR coverage.
Near-instant triage of suspicious changes.
Focused manual review on code that actually matters.
What’s Next?
This proof-of-concept shows how powerful generative AI can be in real-world AppSec automation. It’s not meant to replace manual security review—but to augment it. With careful tuning and human-in-the-loop validation, GenAI can become a force multiplier for AppSec teams in fast-paced DevOps cultures.