As a Data Engineer, your main goal is to build and maintain the systems that process and store data for the Events & Exhibitions ecosystem. You will take scattered, vendor-specific data (from registration systems, apps, marketing tools) and transform it into a unified, AI-ready dataset using a Medallion Architecture on Azure.
Think of it as organizing raw data into a structured pipeline that’s ready for analysis and machine learning.
Requirements
1. Data Ingestion & API Integration (Bronze Layer)
- Build and manage robust ETL/ELT pipelines using Azure Data Factory to ingest data from 3rd-party vendors (REST APIs, Webhooks, SFTP).
- Ensure raw data is landed securely in Azure Data Lake Gen2 (Bronze Layer) without data loss.
- Implement error-handling and logging to monitor the health of real-time and batch ingestion jobs.
2. Transformation & Modeling (Silver & Gold Layers)
- Utilize PySpark (Azure Databricks/Synapse) and SQL to clean, deduplicate, and standardize data in the Silver Layer.
- Execute Identity Resolution logic to stitch together visitor and exhibitor profiles from multiple touchpoints into a "Golden Record."
- Develop optimized data sets in the Gold Layer for high-performance reporting and predictive AI models.
3. Infrastructure & Performance Optimization
- Optimize SQL queries and Spark jobs to reduce Azure compute costs and minimize data latency.
- Maintain the Data Dictionary and technical documentation to ensure the "Engine Room" logic is transparent and scalable.
- Implement data masking and security protocols to ensure GDPR and internal compliance.
4. Business Enablement
- Support the Senior Data Manager in building the Semantic Layer that feeds our Power BI "Data Window."
- Collaborate with the Events Tech team to troubleshoot data discrepancies between front-end apps and back-end tables.
Technical Requirements
- Experience: 3–5 years in Data Engineering with a focus on Cloud environments.
- Core Azure Stack: Proven expertise in Azure Data Factory, Azure Synapse Analytics, and Data Lake Gen2.
- Coding: High proficiency in SQL (complex joins/optimizations) and Python/PySpark.
- Architectural Knowledge: Practical experience with the Medallion Architecture (Bronze/Silver/Gold).
- Integration: Strong experience working with REST APIs and JSON/XML data formats.
Benefits