Powering Research and Analytics with a Data Lake and Hadoop
12:00pm - 1:00pmTuesday, February 12
Orlando - Orange County Convention Center
At NYU Langone Health, we have implemented a data lake that democratizes data across analysts and researchers in a secure, scalable and governed manner. In its early stages, the data lake integrates clinical data from EHR and ancillary systems, enterprise master data and external data sets into the Hadoop data management platform. Data ingestion and data management are directly tied to business value. The data lake architecture mandates minimal data transformation, supports a wide variety of access technologies and encourages self-service analysis, while the Hadoop platform, although complex, provides resilience, security and scalability at much lower costs than alternative approaches. A number of early proof-of-concepts and use cases have proved successful and are serving to validate the approach. Future plans are to continue expanding the data lake as well as integrate it with a recently procured enterprise data governance tool for metadata, data lineage and reference data.
Recognize the unique analytic needs of healthcare researchers, clinical informaticists and data scientists
Compare and contrast different approaches to democratizing data for researchers, clinical informaticists and data scientists
Discuss the benefits and challenges of using Hadoop for enterprise analytics
Employ the Hadoop data management platform to implement a "data lake"