A Comparative Analysis of the Readability and Information Quality of the Chinese and English Versions of Educational Materials for Thoracic Surgery Patients Generated by DeepSeek, Grok-3 and ChatGPT
DOI:
https://doi.org/10.62177/apjcmr.v1i4.731Keywords:
Thoracic Surgery, Thoracoscopic Lobectomy, Large Language Models (LLMs), Patient Educational Materials, Readability, Information Quality, Bilingual ComparisonAbstract
Objective: To comparatively analyze the readability and information quality of the educational materials for patients undergoing thoracoscopic lobectomy in both Chinese and English versions generated by three mainstream Large Language Models (LLMS), namely DeepSeek, Grok-3 and ChatGPT, Provide evidence-based basis for the clinical selection of AI-assisted educational tools.
Method: A cross-sectional study design was adopted, with "education for patients undergoing thoracoscopic lobectomy" as the core requirement. Standardized Chinese and English prompts were designed to drive each of the three models to generate 3 independent educational materials (a total of 18, 9 in Chinese and 9 in English). The readability was evaluated using the internationally recognized readability assessment tools (English: Flesch-Kincaid Grade Level, FKGL; Flesch Reading Ease, FRE; Chinese: average sentence length), and the DISCERN scale was used to evaluate the quality of information. The differences among the three models were compared by the Kruskal-Wallis H test, the differences between the Chinese and English versions were analyzed by the paired sample t-test, and the reliability of the raters was tested by the intraclass correlation coefficient (ICC).
Result: 1. Readability: In the English version, DeepSeek V3 had the highest FRE score (80.36±1.18) and the lowest FKGL score (4.83±0.12), which was significantly better than ChatGPT-o3 (FRE: 67.36±0.74, FKGL:) 6.56±0.36) and Grok3 (FRE: 45.67±1.65, FKGL: 11.93±0.17) (P<0.05); Among the Chinese versions, Grok3 had the shortest average sentence length (17.74±1.02 characters), which was significantly better than ChatGPT-o3 (27.81±1.47 characters) and DeepSeek V3 (26.75±1.18 characters) (P<0.05).
2. Information quality: The reliability of the raters was excellent (ICC=0.92, 95% CI: 0.925-0.998, P<0.001); The DISCERN total scores of the Chinese and English versions of the three models were all at the "good - excellent" level (59.00-71.17 points). Among them, the total scores of the Chinese and English versions of ChatGPT-o3 were the highest (English: 71.17±1.17, Chinese: 70.50±0.55), and Grok3 was the lowest (English: (63.17±0.94, Chinese: 59.00±0.89), and the difference between groups was statistically significant (P<0.05).
Conclusion: Among the educational materials for thoracoscopic lobectomy generated by the three LLMS, the English version of DeepSeeking V3 has the best readability, the Chinese version of Grok3 has outstanding reading fluency, and the comprehensive performance of the Chinese and English versions of ChatGPT-o3 is balanced. The Chinese version still needs to be optimized in terms of terminology consistency and information details. When applying it in clinical practice, the model should be selected in combination with language requirements, and the content generated by AI should be professionally reviewed.
Downloads
References
Bray, F., Laversanne, M., Sung, H., Ferlay, J., Siegel, R. L., Soerjomataram, I., & Jemal, A. (2024). Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 74(3), 229-263. https://doi.org/10.3322/caac.21834
Jiao, W., Zhao, L., Mei, J., Zhong, J., Yu, Y., Bi, N., ... & Gao, S. (2025). Clinical practice guidelines for perioperative multimodality treatment of non-small cell lung cancer. Chinese Medical Journal (English Edition). https://doi.org/10.1097/cm9.0000000000003635
Lee, P., Bubeck, S., & Petro, J. (2023). Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. The New England Journal of Medicine, 388(13), 1233-1239. https://doi.org/10.1056/NEJMsr2214184
Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930-1940. https://doi.org/10.1038/s41591-023-02448-8
Choudhury, A., Shahsavar, Y., & Shamszare, H. (2025). User intent to use DeepSeek for healthcare purposes and their trust in the large language model: Multinational survey study. JMIR Human Factors. https://doi.org/10.2196/72867
Bhushan, R., & Grover, V. (2024). The advent of artificial intelligence into cardiac surgery: A systematic review of our understanding. Brazilian Journal of Cardiovascular Surgery, 39(5), e20230308. https://doi.org/10.21470/1678-9741-2023-0308
Denecke, K., May, R., & Rivera Romero, O. (2024). Potential of large language models in health care: Delphi study. Journal of Medical Internet Research, 26(5), e52399. https://doi.org/10.2196/52399
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589-596. https://doi.org/10.1001/jamainternmed.2023.1838
Khalpey, Z., Kumar, U., King, N., Abraham, A., & Khalpey, A. H. (2024). Large language models take on cardiothoracic surgery: A comparative analysis of the performance of four models on American Board of Thoracic Surgery exam questions in 2023. Cureus, 16(7), e65083. https://doi.org/10.7759/cureus.65083
XAI. (2025). Grok3: Redefining AI Capabilities. Retrieved from https://xai.com/grok3
OpenAI. (2024). ChatGPT Technical Report. Retrieved from https://openai.com/research/chatgpt
Charnock, D., Shepperd, S., Needham, G., & Gann, R. (1999). DISCERN: An instrument for judging the quality of written consumer health information on treatment choices. Journal of Epidemiology & Community Health, 53(2), 105-111. https://doi.org/10.1136/jech.53.2.105
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Shiyu Wang, Yuan Yu

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
DATE
Accepted: 2025-10-16
Published: 2025-11-01










