
Project Detail
AI Red Teaming & Content Moderation for BlueSky
Lead Software Engineer
4 months
New York, NY
Project Detail
As Lead Software Engineer, I built an automated content moderation system for BlueSky to detect and label sensitive and sexual content. The pipeline combined keyword detection, emoji patterns, and perceptual image hashing to achieve 94% accuracy and 100% precision.
I implemented the backend using Python, BlueSky API, OpenCV, Scikit-learn, and JSON-based detection rules, with safeguards to minimize false positives and protect user privacy.
Batching Test Video
Implementation: Our Labelers

Project Goals
Challenge:
Detecting sexual content on social platforms
Why it matters:
Enables user choice in content filtering
Protects minors from inappropriate content
Respects content creators' freedom of expression
Our approach:
Detection rather than censorship
Examples:
Sharing non-consensual nude images or videos (e.g., revenge porn).
Posting intimate content of someone online without their consent.
Sending unsolicited sexual videos, messages, or comments about a person’s body or appearance.
Technical Overview: Architecture

Core Components:
Sexual terminology dictionary with hierarchical structure
Pattern-based detection system using regular expressions
Context analysis with legitimate context filtering
Explicit intensity scoring (0-5 scale)
Image perceptual hash matching system
Empirical Result

Analysis of Performance and Outcome
Key Findings
Perfect precision (100%) across all batches:
Our labeler never incorrectly flagged a regular post as containing sexual content.
High overall accuracy (94%)
The labeler correctly classified 94 out of 100 posts.
Strong recall (90%)
The labeler successfully identified 54 out of 60 posts containing sexual content.
Consistent performance
Processing time remained stable across all batches (approximately 0.27 seconds per post).
Batch variation
Batch 3 achieved perfect scores across all metrics, while Batch 4 had the lowest performance with 3 false negatives.
Code Documentation and Style
The complete code is available on Github. Please refer to the README.md file for details.
The presentation with detailed methodologies and techniques can be found here.
rh692@cornell.edu
©Gloria Hu 2025