AI/MLWebInternship

JingleFace

An AI-powered Christmas campaign app that achieved 10K+ user interactions, delivering real-time Pixar-style image transformations through a fine-tuned diffusion model.

💡 Motivation

PicCollage wanted to create an engaging Christmas campaign that showcases AI capabilities while providing users with a fun, shareable experience. The goal was to transform user photos into Pixar-style animated characters.

🔧 Methods

Fine-tuned a Stable Diffusion model on Pixar-style character images. Implemented a web-based pipeline with image preprocessing, face detection, and style transfer. Optimized for real-time generation with efficient model serving.

📊 Results

Achieved 10K+ user interactions within weeks of launch. Average generation time under 30 seconds. High user satisfaction with 85% share rate on social media.

🛠 Tech Stack

Stable DiffusionPyTorchFastAPIReactAWS

Other Projects

MOSAIC: Exploiting Compositional Blindness in Multimodal Alignment

A novel multimodal jailbreak framework that targets compositional blindness in Vision-Language Models. MOSAIC rewrites harmful requests into Action–Object–State triplets, renders them as stylized visual proxies, and induces state-transition reasoning to bypass safety guardrails. Published at NTU ML 2025 Fall Mini-Conference (Oral).

PEFT-STVG: Parameter-Efficient Fine-Tuning for Spatio-Temporal Video Grounding

Spatio-Temporal Video Grounding (STVG) localizes objects in video frames that match natural language queries across time. While effective, existing methods require full model fine-tuning, creating significant computational bottlenecks that limit scalability and accessibility.