IJET 2025 Vol.17(3): 144-147
DOI: 10.7763/IJET.2025.V17.1318
Key Points of Multimodal Communication Design Driven by AI: Exploration of Visual, Auditory, and Textual Information Integration
Tianyang Chen
University of Arizona, Tucson, Arizona, USA
Email: tianyangchen@arizona.edu (T.Y.C.)
Manuscript received December 20, 2024; accepted March 17, 2025; published July 1, 2025.
Abstract—In order to solve the problem of low efficiency in multimodal information processing, this article analyzes the design of multimodal communication driven by AI, and explores the implementation points of integrating visual, auditory, and textual information. By introducing the idea of multimodal AI information integration, clarify some datasets that can be used. Subsequently, AI technologies are introduced in three dimensions: visual, auditory, and textual information, such as image recognition technology, video analysis technology, speech recognition technology, audio analysis technology, natural language processing technology, and sentiment analysis technology. And in the end, choose Taobao Intelligent Customer Service as a case study to illustrate the application value of multimodal AI technology. Through analysis, it is concluded that multimodal AI communication design has application value in many fields such as high-quality services, improving user experience, and launching personalized services. However, in the process of application, it is also necessary to pay attention to issues such as cost and technical loopholes, and maximize the advantages of multimodal AI communication design. It is hoped that this can provide some reference for future research by relevant personnel.
Keywords—AI, multimodal, visual, auditory perception, text information integration
Cite: Tianyang Chen, "Key Points of Multimodal Communication Design Driven by AI: Exploration of Visual, Auditory, and Textual Information Integration," International Journal of Engineering and Technology, vol. 17, no. 3, pp. 144-147, 2025.
Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (
CC BY 4.0).