Location Dubai UAE Notice Period Immediate to Max 30 Days Experience 5 Years Key Responsibilities Build and maintain scalable ETL pipelines using PySpark on Cloudera Data Platform CDP Ingest data from RDBMS APIs and file systems into CDP Cleanse and transform large datasets to support business needs Optimize performance of PySpark jobs and Cloudera components Automate workflows with Apache Oozie Airflow Collaborate with analysts PMs and engineering teams Ensure data quality validation and thorough documentation MustHave Skills Strong handson expertise in PySpark RDDs DataFrames optimization Experience with Cloudera Manager Hive Impala HDFS HBase Proficiency in SQL ETL processes and big data tech Knowledge of orchestration tools like Oozie Airflow Solid scripting in Linux Looking for someone analytical detailoriented and collaborative If you re passionate about data and ready for your next challenge we want to hear from you
Data Engineer • United Arab Emirates