Technologies/ Skills Used
The CDAS/CDP project is a solution for managing clinical data that includes the Clinical Data Analytics Suite (CDAS) and the Digital Trial Management Suite (DTMS). In order to develop this solution, various technologies were used, such as ERWIN 2021R2 tool, Hive, Hadoop, Scala, Spark, AWS, Data Lake, Snowflake, Airflow, Git, GitHub, HUE, DBeaver, etc. These technologies were used for different work streams such as Clinops, Patient Engagement, DTMS-CTMS, Ops, and played a key role in designing models and tables.
For example, separate data models and tables were created in Hive, and Git and GitHub were used as change control tools to create merge requests for scripts. An automation tool was also developed to deploy DDLs into different environments, which saved time and effort. Additionally, CDISC models and domain tables were created for Patient Engagement in the Hive environment.
The Big Data Factory (BDF) is also a key component of the CDAS/CDP project. BDF is used for modernizing technology and infrastructure, and in this project, it was used for analyzing and mapping data to corresponding GDM tables. The data was analyzed, mapped and updated mapping document with the transformation logics for Lift & Shift and also for new authored system scenarios. Technologies like Hadoop, Hive, Scala, Spark, AWS, Data Lake, Snowflake, Airflow, Git, GitHub, HUE, DBeaver etc were used for creating data models, Mainframe copybooks, and for creating new tables in GDM.