Fraunhofer FKIE tool makes cancer registry work easier
How can cancer registries make the best possible use of the flood of cancer data they receive every day for research and therapy? This was the task of the “TeMeK” project, which was funded by the German Federal Ministry of Health and has now been successfully completed. The result: a tool that uses text mining and data science to support documentation and promote its use. Not only does it save documentarians time, it also contributes to standardization and brings to light aspects that were previously lost in the flood of data. The collaboration will therefore continue.
Molecular genetics is becoming increasingly relevant in the diagnosis, prognosis, and treatment of cancer. This is because the type of gene mutation determines which therapy the cancer responds to best. For this reason, findings from gene sequencing of tumor cells have also been sent to the state cancer registries for several years. All data on tumor diseases is collected there and forms the basis for evaluations that contribute to improving the care of cancer patients. However, medical findings are often unstructured text, with each individual noting the gene mutation in a different way. There is a wealth of undiscovered knowledge in the data that can be unlocked with the help of big data analytics. This can improve the treatment of cancer in the long term.
The aim of the “TeMeK” project, short for “Text mining of notification texts for uniform classifications,” was therefore to convert the findings into structured, machine-readable text using AI methods. To this end, the data scientists at Fraunhofer FKIE, together with partners and in coordination with the Baden-Württemberg Cancer Registry, developed a tool that efficiently extracts, validates, and uniformly schematizes information from these complex free texts. Theresa Nindel, who works in the field of “text processing,” says: “Approximately 21,000 findings from two years were evaluated, identifying approximately 700,000 statements about genes in 10,000,000 words and condensing them into approximately 43,000 mutation assessments.”
The tool thus closes a critical gap in data processing: the preparation of the information ensures that it can be evaluated quantitatively. By automating this step, the tool saves documentarians a lot of time in processing. “The project shows how scientific methods and registry practice can work together: more efficient processes, better data quality, and greater comparability. That's why we want to continue the collaboration even after the project ends,” explains Prof. Dr. Marco Halber from the Baden-Württemberg Cancer Registry.
The work of the FKIE scientists also provided important insights into how the description of gene mutations can be standardized. “The project has advanced the standardization of notation at the findings and documentation level,” says Dr. Hanna Geppert, project and group leader at Fraunhofer FKIE, “which was identified by the documentarians as another major added value.” Finally, standardization and automation revealed aspects in the data that had previously been lost in the flood of data. The participating cancer registries—in addition to the Baden-Württemberg Cancer Registry, the Hessian Cancer Registry, the Brandenburg and Berlin Clinical Cancer Registry, and the Rhineland-Palatinate Cancer Registry—are so convinced of the project's results that the collaboration will continue after the end of the project under the leadership of the Baden-Württemberg Cancer Registry, and the results will be made available to all cancer registries in Germany.
Last modified: