Build data streams to ingest, load, transform, group, logically join and assemble data ready for data analysis / analytics/ reporting , build data streams to ingest, load, transform, group, logically join and assemble data ready for data analysis / analytics/ reporting.
Pipeline data using Cloud: Databricks , AWS Big Data Services etc.
Responsible to write Pyspark code using DataBricks to connect databases, AWS services to transform data.
Design/Implement QA framework within the Data-Lake. Design and implement test strategies, write test cases, design/implement test automation
Responsible for maintaining integrity between Data Lake, databases.
Work on Data Lake/Delta Lake Data Pipeline to take data across from the source all the way to the consumption layer.
Maintain knowledge and proficiency of current and upcoming hardware/software technologies. Mentor junior staff in ramping up analytical and technical skills.
SSAS + ETL and Data Engineering experience rather than just Data Engineering skill
Requirements:
A bachelor's degree from an accredited college in Computer Science or equivalent.
Strong knowledge of Databricks Data Lake/Delta Lake developments
Strong knowledge of AWS data related services (DMS, Glue, EMR, S3, Athena, Lambda, Redshift, DynamoDB, KMS).