Deep Learning Models across Image, Text, and Multimodal Domains
👥 Group TCPS Members:
Đoàn Tấn Sang (2252711) • Lưu Chí Cường (2252097) • Ngô Nhật Tuấn (2213774) • Võ Hoàng Phúc (2252650) 👨🏫 Instructor: Dr. Lê Thành Sách | HCMUT - VNU-HCM
Select a dataset to explore the detailed report
🖼️
Image Dataset
Fashion-MNIST Classification
Size: 70,000 grayscale images (28x28)
Classes: 10 clothing categories
Backbones: CNN (DenseNet) vs. ViT
Highlights: Achieved ~94% accuracy, balanced distribution analysis.
📝
Text Dataset
VNTC (Vietnamese News Corpus)
Size: 84,132 Vietnamese articles
Classes: 10 news topics (Politics, Sports, etc.)
Backbones: RNN (LSTM) vs. Transformer (PhoBERT)
Highlights: Handled class imbalance, robust word segmentation (BPE).
🧩
Multimodal Dataset
MSCOCO (Microsoft Common Objects)
Structure: 1 Image mapped to 5 Captions
Classes: Subset of 5 animal categories
Approach: CLIP (Contrastive Pre-training)
Highlights: Compared Zero-shot vs. Few-shot classification capabilities.
🎥 Project Presentation
Watch our full team presentation covering all three domains (Image, Text, and Multimodal).