⬅ Home

🚀 Assignment 1: Classification Projects

Deep Learning Models across Image, Text, and Multimodal Domains
👥 Group TCPS Members: Đoàn Tấn Sang (2252711) • Lưu Chí Cường (2252097) • Ngô Nhật Tuấn (2213774) • Võ Hoàng Phúc (2252650)
👨‍🏫 Instructor: Dr. Lê Thành Sách | HCMUT - VNU-HCM

Select a dataset to explore the detailed report

🖼️

Image Dataset

Fashion-MNIST Classification

  • Size: 70,000 grayscale images (28x28)
  • Classes: 10 clothing categories
  • Backbones: CNN (DenseNet) vs. ViT
  • Highlights: Achieved ~94% accuracy, balanced distribution analysis.
📝

Text Dataset

VNTC (Vietnamese News Corpus)

  • Size: 84,132 Vietnamese articles
  • Classes: 10 news topics (Politics, Sports, etc.)
  • Backbones: RNN (LSTM) vs. Transformer (PhoBERT)
  • Highlights: Handled class imbalance, robust word segmentation (BPE).
🧩

Multimodal Dataset

MSCOCO (Microsoft Common Objects)

  • Structure: 1 Image mapped to 5 Captions
  • Classes: Subset of 5 animal categories
  • Approach: CLIP (Contrastive Pre-training)
  • Highlights: Compared Zero-shot vs. Few-shot classification capabilities.

🎥 Project Presentation

Watch our full team presentation covering all three domains (Image, Text, and Multimodal).

▶️ Watch Directly on YouTube