2018 KDD Workshop on Machine Learning for Medicine and Healthcare

 Location: London, United Kingdom
 Workshop Date: August 20, 2018
Follow us on Twitter


  • Program and Invited Speaker lineup published.
  • All accepted papers will be part of the KDD Health Day Spotlight round in the morning session. Authors should use the Spotlight Template
  • Accepted paper list published. See Papers.
  • Paper acceptance notification updated to Jun 21, 2018
  • Flyer released See Here
  • Paper submission deadline extended to May 25, 2018
  • Announcing $1,000 travel grant for best selected student paper to the workshop. Please see updated call for papers.
  • We have been selected to be part of KDD Health Day


Over the recent years, the decreasing cost of data acquisition and ready availability of data sources such as Electronic Health records (EHR), claims, administrative data and patient-generated health data (PGHD), as well as unstructured data, have led to an increased focus on data-driven and ML methods for medical and healthcare domain. From the systems biology point of view, large multimodal data typically including omics, clinical measurements, and imaging data are now readily available. Valuable information for obtaining mechanistic insight into the disease is also currently available in unstructured formats for example in the scientific literature. The storage, integration, and analysis of these data present significant challenges for translational medicine research and impact on the effective exploitation of the data. Furthermore, intelligent analysis of observational data from EHR and PGHD sources and integration of insights generated from the same to the system biology sphere can greatly improving patient experience, outcome, and improving the overall health of the population while reducing per capita cost of care. However, the black-box nature, inherent in some of the best performing ML methods, has widened the gap between how human and machines think and often failed to provide explanations to make insights actionable. In the new era with users of “right for explanation”, this is detrimental to the adoption in practice. To drive the usage of such rich yet heterogeneous datasets into actionable insights, we aim to bring together a wide array of stakeholders, including practitioners, biomedical and data science specialists, and industry solution subject matter experts. We will seek to start discussions in the area of precision medicine as well as the importance of interpretability of ML models towards the increased practical use of ML in medicine and healthcare.


MLMH 2018will be held as part of KDD Health Day on August 20, 2018. All accepted papers will be shown as part of the KDD Health Day Spotlight session (see below). In addition, papers selected for oral presentation will also have the opprutunity to take part in the evening KDD poster session as part of KDD Health Day.

Please refer to the KDD 2018 full schedule for up-to-date changes on venues and timings.

KDD Health Day Morning Session

Venue:   ICC Capital Suite Room 12 and 13 (Level 3)
Agenda:  Panel discussion and Spotlight presentations
Timing:  8:00 am - 9:15 am, August 20
MLMH participitation:  All accepted papers

Please check KDD Health Day for more details.

KDD Health Day Poster Session

Venue:  ICC Capital Hall (Level 0)
Agenda:  Invited posters from Health Day at KDD Poster Reception
Timing:  7:00 pm - 9:30 pm, August 20
MLMH participitation:  Papers selected for oral presentation

Please check KDD Program for more details.

MLMH Schedule

Venue: ICC Capital Suite Room 12 (Level 3)

Start Time End Time Title Speaker
1:30 pm 1:40 pm Introduction and Welcome
Presentation of Best Student paper award
1:40 pm 2:10 pm Invited Talk:
Seeking Sophisticated but Interpretable Machine Learning Models for Healthcare Applications
Jimeng Sun
Georgia Tech
2:10 pm 2:30 pm Select Paper Presentations (10 mins each)
#19: PIVETed-Granite: Computational Phenotypes through Constrained Tensor Factorization
#12: Bryan Lim, Forecasting Disease Trajectories inAlzheimer's Disease Using Deep Learning
#19: Jette Henderson
#12: Bryan Lim
2:30 pm 3:00 pm Coffee and Poster Session N/A
3:00 pm 3:50 pm Select Paper Presentations (10 mins each)
#9: Predicting Infant Motor Development Status using Day Long Movement Data from Wearable Sensors
#11: Interpretable Patient Mortality Prediction with Multi-value Rule Sets
#24: Mapping Unparalleled Clinical Professional and Consumer Languages with Embedding Alignment
#29: Ensemble learning with Conformal Predictors: Targeting credible predictions of conversion from Mild Cognitive Impairment to Alzheimer’s Disease
#33: Multi-Task Learning with Incomplete Data for Healthcare
#19: David Goodfellow
#11: Tong Wang
#24: Wei-Hung Weng
#29: Telma Pereira
#33: Xin Hunt, Mul
3:50 pm 4:20 pm Invited Talk:
Machine learning with health record knowledge graphs
Daniel Bean,
King's College London
4:20 pm 4:55 pm Panel Discussion Fred Rahmanian (Moderator)
Ankur Teredesai
Sheng-Hua Bao
Aldo Faisal
4:55 pm 5:00 pm Concluding Remarks TBD

Invited Speakers

Dr. Jimeng Sun, Georgia Tech, USA.

Bio: Jimeng Sun is an Associate Professor of College of Computing at Georgia Tech. Prior to Georgia Tech, he was a researcher at IBM TJ Watson Research Center. His research focuses on health analytics and data mining, especially in designing tensor factorizations, deep learning methods, and large-scale predictive modeling systems. Dr. Sun has been collaborating with many healthcare organizations. He published over 120 papers and filed over 20 patents (5 granted). He has received SDM/IBM early career research award 2017, ICDM best research paper award in 2008, SDM best research paper award in 2007, and KDD Dissertation runner-up award in 2008. Dr. Sun received B.S. and M.Phil. in Computer Science from Hong Kong University of Science and Technology in 2002 and 2003, PhD in Computer Science from Carnegie Mellon University in 2007 advised by Christos Faloutsos.

Title Seeking Sophisticated but Interpretable Machine Learning Models for Healthcare Applications
Time 1:40 pm - 2:10 pm
Abstract Healthcare Applications People often talk about trade-off between model accuracy and interpretability. However, in healthcare, we need both. In this talk, I will present two examples of sophisticated models that can be accurate yet interpretable. 1) Intensive Care Unit (ICU) outcome prediction: Integration of high-density ICU monitoring data with the discrete clinical events (including diagnosis, medications, labs) is challenging but potentially rewarding since richness and granularity in such multimodal data increase the possibilities for accurate detection of complex problems and predicting outcomes (e.g., length of stay and mortality). We propose Recurrent Attentive and Intensive Model (RAIM) for jointly analyzing continuous monitoring data and discrete clinical events. RAIM introduces an efficient attention mechanism for continuous monitoring data (e.g., ECG), which is guided by discrete clinical events (e.g, medication usage). 2) Heart Failure Phenotyping: Understanding subtypes of heart failure patients is extremely hard but important. We propose an integer tensor factorization method SUSTain to model electronic health records. EHR data are commonly represented by integers (e.g., the values correspond to event counts or ordinal measures). The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. In our preliminary study, 87% of the resulting phenotypes were clinically meaningful.

Dr. Daniel Bean, King's College London, UK.

Bio: Dr Dan Bean is a UKRI Innovation Fellow funded by Health Data Research UK. He works closely with clinical collaborators across King’s Health Partners applying machine learning to patient records at scale. His research develops machine learning methods based on knowledge graphs that combine large public datasets with health records to predict and explain patient outcomes. The focus is on delivering real world clinical value through multiple clinical collaborations including atrial fibrillation management, patient flow, adverse drug reactions, cancer subtyping and kidney failure.

Title Machine learning with health record knowledge graphs
Time 3:50 pm - 4:20 pm
Abstract Knowledge graphs are a powerful framework for predictive machine learning with electronic health records. We recently showed that a knowledge graph-based algorithm outperformed standard methods (logistic regression, decision trees and support vectors) at predicting unknown adverse reactions to drugs already on the market. I will give an overview of the real-time clinical data integration (CogStack) and semantic search (SemEHR) platforms and discuss how we have used this framework for predictive machine learning in several clinical areas including adverse drug reactions, kidney failure and clinical risk scoring.

Accepted Papers

We have accepted 22 papers for presentation at the workshop. All papers will be presented as posters within the workshop. In addition, we have selected 7 papers for oral presentation. PDF version of the final papers, if provided by the authors, are hyperlinked below.

Oral Presentations

#9   Predicting Infant Motor Development Status using Day Long Movement Data from Wearable Sensors
David Goodfellow, Ruoyu Zhi, Rebecca Funke, José Carlos Pulido, Maja Matarić and Beth A. Smith
#11   Interpretable Patient Mortality Prediction with Multi-value Rule Sets
Tong Wang, Veerajalandhar Allareddy, Sankeerth Rampa and Veerasathpurush Allareddy
#12   Forecasting Disease Trajectories in Alzheimer's Disease Using Deep Learning
Bryan Lim and Mihaela van der Schaar
#19   PIVETed-Granite: Computational Phenotypes through Constrained Tensor Factorization
Jette Henderson, Bradley A. Malin, Joyce C. Ho and Joydeep Ghosh
#24   Mapping Unparalleled Clinical Professional and Consumer Languages with Embedding Alignment
Wei-Hung Weng and Peter Szolovits
#29   Ensemble learning with Conformal Predictors: Targeting credible predictions of conversion from Mild Cognitive Impairment to Alzheimer’s Disease
Telma Pereira, Sandra Cardoso, Dina Silva, Manuela Guerreiro, Alexandre de Mendonça and Sara C. Madeira
#33   Multi-Task Learning with Incomplete Data for Healthcare
Xin Hunt, Saba Emrani, Ilknur Kaynar Kabul and Jorge Silva

Poster Presentations

#6   Mammography Dual View Mass Correspondence
Shaked Perek, Ayelet Akselrod-Ballin, Alon Hazan and Ella Barkan
#15   Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks
Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig and Gautam Shroff
#20   YouTube for Patient Education: A Deep Learning Approach for Understanding Medical Knowledge from User-Generated Videos
Xiao Liu, Bin Zhang, Anjana Susarla and Rema Padman
#21   Recognising Cardiac Abnormalities in Wearable Device Photoplethysmography (PPG) with Deep Learning
Stewart Whiting
#25   Online Heart Rate Prediction using Acceleration from a Wrist Worn Wearable
Ryan Mcconville, Gareth Archer, Ian Craddock, Herman Ter Horst, Robert Piechocki, James Pope and Raul Santos-Rodriguez
#27   A hybrid deep learning approach for medical relation extraction
Veera Raghavendra Chikka and Kamalakar Karlapalem
#30   Building a Controlled Vocabulary for Standardizing Precision Medicine Terms
Meng Wu, Yan Liu, Hongyu Kang, Si Zheng, Jiao Li and Li Hou
#32   Measuring the quality of Synthetic data for use in competitions
James Jordon, Jinsung Yoon and Mihaela van der Schaar
#34   Generating Synthetic but Plausible Healthcare Record Datasets
Laura Aviñó, Matteo Ruffini and Ricard Gavaldà
#35   Mammography Assessment using Multi-Scale Deep Classifiers
Ulzee An, Khader Shameer and Lakshmi Subramanian
#39   Hierarchical Deep Learning Ensemble to Automate the Classification of Breast Cancer Pathology Reports by ICD-O Topography
Waheeda Saib, David Sengeh, Gcininwe Dlamini and Elvira Singh
#42   Synthetic Sampling for Multi-Class Malignancy Prediction
Matthew Yung, Eli T. Brown, Alexander Rasin, Jacob D. Furst and Daniela S. Raicu
#43   PGLasso: Microbial Community Detection through Phylogenetic Graphical Lasso
Chieh Lo and Radu Marculescu
#45   Murmur Detection Using Parallel Recurrent & Convolutional Neural Networks
Shahnawaz Alam, Rohan Banerjee and Soma Bandyopadhyay
#47   From Text to Topics in Healthcare Records: An Unsupervised Graph Partitioning Methodology
M. Tarik Altuncu, Erik Mayer, Sophia N. Yaliraki and Mauricio Barahona

Call for Papers

We invite full papers, as well as work-in-progress on the application of machine learning for precision medicine and healthcare informatics. Topics may include, but not limited to, the following topics (For more information see workshop overview)

  • Data Standards for Translational Medicine Informatics
  • Analysis of large scale electronic health records or patient-generated health data records
  • Visualisation of complex and dynamic biomedical networks
  • Disease Subtype Discovery for Precision Medicine
  • Interpretable Machine Learning for biomedicine and healthcare
  • Deep learning for biomedicine

Papers must be submitted in PDF format to https://easychair.org/conferences/?conf=mlmh2018 and formatted according to the new Standard ACM Conference Proceedings Template . Papers must be a maximum length of 4 pages, including references.

The program committee will select the papers based on originality, presentation, and technical quality for spotlight and/or poster presentation.

The best selected student paper will be granted with a $1,000 travel grant. Please send a note to mlmhworkshop@googlegroups.com to indicate that you would like to be considered in your submission.

Key Dates

  • Paper Submission: May 25, 2018
  • Acceptance Notice: Jun 21, 2018
  • Workshop Date: Aug 20, 2018

All deadlines correspond to 11:59 PM Pacific Standard Time