國家衛生研究院 NHRI:Item 3990099045/15625
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 12145/12927 (94%)
造訪人次 : 911653      線上人數 : 957
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    主頁登入上傳說明關於NHRI管理 到手機版
    請使用永久網址來引用或連結此文件: http://ir.nhri.org.tw/handle/3990099045/15625


    題名: Unlocking the secrets behind advanced artificial intelligence language models in deidentifying Chinese-English mixed clinical text: Development and validation study
    作者: Lee, YQ;Chen, CT;Chen, CC;Lee, CH;Chen, P;Wu, CS;Dai, HJ
    貢獻者: National Center for Geriatrics and Welfare Research;National Institute of Cancer Research
    摘要: BACKGROUND: The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text. OBJECTIVE: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner. METHODS: We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models' outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text. RESULTS: The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model's performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts. CONCLUSIONS: The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI.
    日期: 2024-01-25
    關聯: Journal of Medical Internet Research. 2024 Jan 25;26:Article number e48443.
    Link to: http://dx.doi.org/10.2196/48443
    JIF/Ranking 2023: http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=NHRI&SrcApp=NHRI_IR&KeyISSN=1438-8871&DestApp=IC2JCR
    Cited Times(WOS): https://www.webofscience.com/wos/woscc/full-record/WOS:001167441500001
    Cited Times(Scopus): https://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85183509740
    顯示於類別:[吳其炘] 期刊論文
    [其他] 期刊論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    PUB38271060.pdf1274KbAdobe PDF137檢視/開啟


    在NHRI中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋