Download PDFOpen PDF in browser

Transforming Video Search: Leveraging Multimodal Techniques and LLMs for Optimal Retrieval

EasyChair Preprint 15807

11 pagesDate: February 11, 2025

Abstract

The fast development of online video material has made video searching critical in the digital era. Traditional approaches, such as image-text retrieval, object, audio, color, and text-based searches, have made considerable advances in this field. However, these approaches frequently require refining when dealing with numerous users’ inquiries at the same time, which might result in overlapping searches. Furthermore, present video search algorithms must increase their capacity to respond to complicated questions including data extraction from several frames. Overcoming these limits is critical for creating scalable and userfriendly video search engines. In this research, we provide an improved video search system that includes three significant breakthroughs. We enhance text detection for Vietnamese, employ picture captioning to increase search relevancy, and allow users to modify queries with a large language model (LLM) for more precision. These innovations considerably increase the search process’s efficiency and accuracy. The intuitive interface enables seamless searches by queries, frame IDs, and related images, while offering sophisticated features such as query expansion, result aggregation, and integrated feedback for enhanced search accuracy.

Keyphrases: embedding-based search, interactive video retrieval, multimodal and multimedia retrieval, text-based image retrieval

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15807,
  author    = {Binh Dinh and An Dao and Bao Trinh and Truong Dinh and Nguyen Vu},
  title     = {Transforming Video Search: Leveraging Multimodal Techniques and LLMs for Optimal Retrieval},
  howpublished = {EasyChair Preprint 15807},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser