School of Intelligent Science and Engineering, Chengdu Neusoft University, Sichuan, China
Email: 594251028@qq.com
Abstract——With the rapid development of large language models and the deep integration of artificial intelligence into various industries the requirements for accuracy and generalization ability in Natural Language Processing (NLP) are constantly rising. Large models centered on Transformers are propelling NLP into a new stage. Chinese lacks natural word boundaries, making word segmentation a fundamental step in Chinese NLP, and its accuracy directly determines the effectiveness of subsequent tasks. Due to its linguistic characteristics, Chinese word segmentation has long faced three core challenges: inconsistencies between general vocabularies and segmentation standards, difficulties in handling ambiguous segments, and poor performance in out-of-vocabulary word identification. The paper highlights the advantages of deep learning for word segmentation, detailing classic neural networks such as CNN, RNN, LSTM, and BiLSTM-CRF, as well as the application of pre-trained models including BERT, RoBERTa, and lightweight real-time models. The paper emphasizes the advantages of deep learning in word segmentation, detailing classic neural network models such as CNN, RNN, LSTM, and BiLSTM CRF, as well as the application of BERT, RoBERTa pre-trained models, and lightweight real-time models in word segmentation. Research shows that deep learning-based word segmentation methods offer the best overall performance, effectively solving challenges in ambiguous segmentation and out-of-vocabulary word recognition. Different algorithms and systems can meet the diverse needs of scientific research, industry, and vertical fields. This study clarifies the evolution of Chinese word segmentation technology, providing a reference for the selection, engineering implementation, and optimization of word segmentation algorithms in the large model era, and is highly valuable for advancing the high-quality development of Chinese natural language processing.
Keywords—large language model, chinese word segmentation, deep learning, natural language processing
Cite: Yuemeng Ren, "Research on Core Issues and Mainstream Algorithms of Chinese Word Segmentation," International Journal of Engineering and Technology, vol. 18, no. 2, pp. 49-52, 2026.
Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (
CC BY 4.0).