Loading...

Back to Projects
AI/MLComputer VisionCourse Project⭐ Featured

Fine-tune LLAVA on Autonomous Driving (ECCV 2024 Challenge)

PREVISION: PRe-training Enhanced Versatile Integration of Semantics, Images, and Object Detection for Novel Corner Case Analysis in Autonomous Driving, NTU DLCV Fall 2024 Final Project | ECCV 2024 Autonomous Driving Challenge

📍 Challenge

This project extends LLaVA (Large Language and Vision Assistant) with a custom multimodal architecture for the ECCV 2024 Autonomous Driving Challenge. Our approach integrates:
  • RGB Images via CLIP vision encoder
  • Object Detection (34 autonomous driving classes) via custom bounding box encoder

🔍 Key Findings

  • LoRA fine-tuning with post-processing achieved the best performance (4.09)
  • Pre-training both projectors degraded performance due to error propagation from noisy detection labels
  • Multi-stage inference (knowledge transfer) did not improve results in our setting

🔧 Methods

The model performs three tasks:
  • General Perception: Describe all objects affecting the ego vehicle's driving behavior
  • Regional Perception: Explain specific objects highlighted in the scene
  • Driving Suggestions: Provide actionable driving recommendations based on scene understanding

📸 Figures

Prevision Overview
Figure 1. Overall Pipeline of PREVISION.

📊 Results

PREVISION Results
Figure 2. Quantitative results on the ECCV 2024 Autonomous Driving Challenge.

🛠 Tech Stack

PyTorchCUDALLaVAFlashAttention