Download PDFOpen PDF in browserTransforming Video Search: Leveraging Multimodal Techniques and LLMs for Optimal RetrievalEasyChair Preprint 1580711 pages•Date: February 11, 2025AbstractThe fast development of online video material has made video searching critical in the digital era. Traditional approaches, such as image-text retrieval, object, audio, color, and text-based searches, have made considerable advances in this field. However, these approaches frequently require refining when dealing with numerous users’ inquiries at the same time, which might result in overlapping searches. Furthermore, present video search algorithms must increase their capacity to respond to complicated questions including data extraction from several frames. Overcoming these limits is critical for creating scalable and userfriendly video search engines. In this research, we provide an improved video search system that includes three significant breakthroughs. We enhance text detection for Vietnamese, employ picture captioning to increase search relevancy, and allow users to modify queries with a large language model (LLM) for more precision. These innovations considerably increase the search process’s efficiency and accuracy. The intuitive interface enables seamless searches by queries, frame IDs, and related images, while offering sophisticated features such as query expansion, result aggregation, and integrated feedback for enhanced search accuracy. Keyphrases: embedding-based search, interactive video retrieval, multimodal and multimedia retrieval, text-based image retrieval
|