國家衛生研究院 NHRI:Item 3990099045/14074
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 12145/12927 (94%)
Visitors : 856273      Online Users : 435
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://ir.nhri.org.tw/handle/3990099045/14074


    Title: Principle-based approach for the de-identification of code-mixed electronic health records
    Authors: Wang, C;Wang, F;Lee, Y;Chen, P;Wang, B;Su, C;Kuo, C;Wu, C;Chien, Y;Dai, H;Tseng, VS;Hsu, W
    Contributors: National Center for Geriatrics and Welfare Research;National Institute of Cancer Research
    Abstract: Code-mixing is a phenomenon when at least two languages combined in a hybrid way in the context of a single conversation. The use of mixed language is widespread in multilingual and multicultural countries and poses significant challenges for the development of automated language processing tools. In Taiwan’s electronic health record (EHR) systems, the unstructured EHR texts are usually represented in the mixing of English and Chinese languages resulting in the difficulty for de-identification and synthetization of protected health information (PHI). We explored this problem by applied several state-of-the-art pre-trained mono- and multilingual language models and proposed to apply the principle-based approach (PBA) for the tasks of PHI recognition and resynthesis on a code-mixed EHR corpus, which was annotated with 6 main categories and 25 subcategories of PHIs. In PBA, a hierarchical principle slot schema is defined to encode knowledge of code-mixed PHIs and the defined slots were learned from the training set to assemble into principles for recognizing PHI mentions and synthesizing surrogates at the same time. A semantic disambiguation process is developed used to disambiguate ambiguous PHI categories in the de-identification process and to dynamically extend the knowledge encoded in PBA during the knowledge augmentation process. The experimental results demonstrate that the proposed method can achieve the best micro- and macro-F-scores performance in comparison with the other mono- and multilingual language models fine-tuned on our code-mixed corpus.
    Date: 2022-02-01
    Relation: IEEE Access. 2022 Feb;10:22875-22885.
    Link to: http://dx.doi.org/10.1109/ACCESS.2022.3148396
    JIF/Ranking 2023: http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=NHRI&SrcApp=NHRI_IR&KeyISSN=2169-3536&DestApp=IC2JCR
    Cited Times(WOS): https://www.webofscience.com/wos/woscc/full-record/WOS:000766560600001
    Cited Times(Scopus): https://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85124199713
    Appears in Collections:[Chi-Shin Wu] Periodical Articles
    [Others] Periodical Articles

    Files in This Item:

    File Description SizeFormat
    SCP85124199713.pdf816KbAdobe PDF177View/Open


    All items in NHRI are protected by copyright, with all rights reserved.

    Related Items in TAIR

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback