Project Detail

AI Red Teaming & Content Moderation for BlueSky

Lead Software Engineer

4 months

New York, NY

Project Detail

As Lead Software Engineer, I built an automated content moderation system for BlueSky to detect and label sensitive and sexual content. The pipeline combined keyword detection, emoji patterns, and perceptual image hashing to achieve 94% accuracy and 100% precision.

I implemented the backend using Python, BlueSky API, OpenCV, Scikit-learn, and JSON-based detection rules, with safeguards to minimize false positives and protect user privacy.

Batching Test Video

Implementation: Our Labelers

Project Goals

Challenge:

Detecting sexual content on social platforms

Why it matters:

Enables user choice in content filtering

Protects minors from inappropriate content
Respects content creators' freedom of expression

Our approach:

Detection rather than censorship

Examples:

Sharing non-consensual nude images or videos (e.g., revenge porn).
Posting intimate content of someone online without their consent.
Sending unsolicited sexual videos, messages, or comments about a person’s body or appearance.

Technical Overview: Architecture

Core Components:

Sexual terminology dictionary with hierarchical structure

Pattern-based detection system using regular expressions

Context analysis with legitimate context filtering

Explicit intensity scoring (0-5 scale)

Image perceptual hash matching system

Empirical Result

Analysis of Performance and Outcome

Key Findings

Perfect precision (100%) across all batches:

Our labeler never incorrectly flagged a regular post as containing sexual content.

High overall accuracy (94%)

The labeler correctly classified 94 out of 100 posts.

Strong recall (90%)

The labeler successfully identified 54 out of 60 posts containing sexual content.

Consistent performance

Processing time remained stable across all batches (approximately 0.27 seconds per post).

Batch variation

Batch 3 achieved perfect scores across all metrics, while Batch 4 had the lowest performance with 3 false negatives.

Code Documentation and Style

The complete code is available on Github. Please refer to the README.md file for details.

The presentation with detailed methodologies and techniques can be found here.

rh692@cornell.edu