Background: Cancer prevention efforts would not only diminish morbidity and social impact, but also reduce health costs. When being attended in health care centers, patients’ data is routinely collected in electronic health record (EHR) database. We hypothesize that these data, properly processed, can provide information to predict subjects at higher risk of developing cancer within the next years. Also, EHR can be used to predict cancer outcomes such as early unexpected mortality, toxicity to chemotherapy, progression or metastasis.
Aims: Here we propose to develop and validate machine-learning based cancer risk models for common cancer sites using routine collected health data. The second aim is to predict cancer outcomes using data extracted from EHR.
Patients and methods: To develop cancer risk models we will use data provided by the Public Data Analysis for Health Research and Innovation Program (PADRIS), that covers 80% of Catalonian population attended by 71 hospitals, 369 primary care centers, and 96 social-health centers, among other services. We estimate to work with over 150,000 cancer cases and 1.5 million controls, corresponding to the new cases diagnosed during 2012-17. Classical and neural-network deep learning methods will be used to model cancer risk and profit the longitudinal nature of the data. For the second aim, text-mining algorithms will be developed to extract structured data from text-based EHR and will fit machine learning strategies to predict cancer outcomes. Over 20,000 patients diagnosed during 2013-15 and followed up at least 5 years will be studied as target population. Data from 35,000 patients diagnosed during 2016-2020 will be available and used for retro and prospective model validation.
Expected results: Once validated, these tools will be incorporated into the health information system to continuously monitor patient data and help identifying patients at higher risk of developing cancer or adverse cancer outcomes.
- Doctoral Degree
- Standard European Fellowships (ST): must not have spent or carried out the main activity (work, studies, etc.) in Spain for more than 12 months between 11th September 2016 and 11th September 2019.
- Career Restart Panel (CAR) or Reintegration Panel (RI): must not have spent or carried out the main activity (work, studies, etc.) in Spain for more than 36 months between 11th September 2014 and 11th September 2019.
- Global Fellowship (GF): Open to Experienced Researchers (ER) nationals or long-term residents of MS / AC (period of full-time research activity in MS/AC of at least 5 consecutive years) who has not resided or carried out his/her main activity (work, studies, etc.) in the Third Country (TC) for more than 12 months between 11th September 2016 and 11th September 2019.
- Fluent English language (written and spoken)
- Proven knowledge in Machine learning skills (at least 1 first or corresponding author publication in a research paper that uses this technique).
- Cancer, genomics and/or electronic health record (EHR) databases will be a plus.
Highly motivated candidates, please, send complete CV, cover letter and two reference contacts, before June 13th 2019.