Session ID: 

Powering Research and Analytics with a Data Lake and Hadoop

12:00pm - 1:00pm Tuesday, February 12
Orlando - Orange County Convention Center


At NYU Langone Health, we have implemented a data lake that democratizes data across analysts and researchers in a secure, scalable and governed manner. In its early stages, the data lake integrates clinical data from EHR and ancillary systems, enterprise master data and external data sets into the Hadoop data management platform. Data ingestion and data management are directly tied to business value. The data lake architecture mandates minimal data transformation, supports a wide variety of access technologies and encourages self-service analysis, while the Hadoop platform, although complex, provides resilience, security and scalability at much lower costs than alternative approaches. A number of early proof-of-concepts and use cases have proved successful and are serving to validate the approach. Future plans are to continue expanding the data lake as well as integrate it with a recently procured enterprise data governance tool for metadata, data lineage and reference data.

Learning Objectives: 

  • Recognize the unique analytic needs of healthcare researchers, clinical informaticists and data scientists
  • Compare and contrast different approaches to democratizing data for researchers, clinical informaticists and data scientists
  • Discuss the benefits and challenges of using Hadoop for enterprise analytics
  • Employ the Hadoop data management platform to implement a "data lake"


Senior Director, Data Warehousing/Analytics,
NYU Langone Medical Center
Director Data Architecture and Strategy,
NYU Langone Medical Center


Clinical Informaticists
Information Management Professional