Dining on Details: LLM-Guided Expert Networks for Fine-Grained Food Recognition

Abstract

PWC

In the field of fine-grained food recognition, subset learning-based methods offer a strategic approach that groups classes into subsets to guide the training process. Our study introduces a novel approach, referred to as the Dining on Details (DoD), an innovative expert learning framework for food classification. This method ingeniously harnesses the power of large language models to construct subsets of classes within the dataset. The Dining on Details’s efficacy is rooted in the robustness of the ImageBind multi-modality embedding space, which can identify meaningful similarities across varied categories. Trained through an end-to-end multi-task learning process, this method enhances performance in the fine-grained food recognition task, showing exceptional prowess with highly similar classes. A key advantage of DoD is its universal compatibility, allowing it to be applied seamlessly to any existing classification architecture. Our comprehensive validation of this method on various food datasets and backbones, both convolutional and transformer-based, reveals competitive results with significant performance gains ranging from 0.5% to 1.61%. Notably, it achieves state-of-the-art results on the Food-101 dataset.

Publication
In Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management (MADiMa ‘23), co-located with ACM Multimedia 2023
Jesús M. Rodríguez-de-Vera
Jesús M. Rodríguez-de-Vera
PhD Candidate in Computer Vision

My research interests include computer vision, deep learning and artificial intelligence.