Senior Data Engineer (Jaipur, India)

Location

Jaipur

Experience

6-8 years

Industry

AI/Technology

Job Summary

Design, build, and optimize large-scale, production-grade data pipelines and analytics platforms on Azure, leveraging Databricks, Synapse, and the broader Microsoft data ecosystem. Deliver business-critical data assets for analytics, BI, and AI/ML initiatives.

Key Technical Responsibilities

Architect modern data lakes using Azure Data Lake Storage Gen2 for batch and streaming workloads.
Build and maintain scalable ETL/ELT pipelines using Azure Data Factory and Databricks (PySpark, Scala, SQL).
Orchestrate data workflows across ADF, Databricks, and Synapse Pipelines; implement modular and reusable data pipeline components.
Develop advanced notebooks and production jobs in Azure Databricks (PySpark, SparkSQL, Delta Lake).
Optimize Spark jobs by tuning partitioning, caching, cluster configuration, and autoscaling for performance and cost.
Implement Delta Lake for ACID-compliant data lakes and enable time travel and audit features.
Engineer real-time data ingestion from Event Hubs, IoT Hub, and Kafka into Databricks and Synapse.
Transform and enrich raw data, building robust data models and marts for analytics and AI use cases.
Integrate structured, semi-structured, and unstructured data sources, including APIs, logs, and files.
Implement data validation, schema enforcement, and quality checks using Databricks, PySpark, and tools like Great Expectations.
Manage access controls using Azure AD, Databricks workspace permissions, RBAC, and Azure Key Vault integration.
Enable end-to-end data lineage and cataloging via Microsoft Purview (or Unity Catalog in multi-cloud environments).
Automate deployment of Databricks assets (notebooks, jobs, clusters) using Databricks CLI/REST API, ARM/Bicep, or Terraform.
Build and manage CI/CD pipelines in Azure DevOps for data pipelines and infrastructure as code.
Containerize and deploy custom code using Azure Kubernetes Service (AKS) or Databricks Jobs as required.
Instrument monitoring and alerting with Azure Monitor, Log Analytics, and Databricks native tools.
Diagnose and resolve performance bottlenecks in distributed Spark jobs and pipeline orchestrations.
Collaborate with data scientists, BI engineers, and business stakeholders to design and deliver scalable data solutions.
Document design decisions, create technical specifications, and enforce engineering standards across the team.

Required Skills & Experience:

Hands-on with:
- Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Databricks
- PySpark, SparkSQL, advanced SQL, Delta Lake
- Data modeling (star/snowflake), partitioning, and data warehouse concepts
Strong Python programming and experience with workflow/orchestration (ADF, Airflow, or Synapse Pipelines)
Infrastructure automation: ARM/Bicep, Terraform, Databricks CLI/API, Azure DevOps
Deep understanding of Spark internals, cluster optimization, cost management, and distributed computing
Data security, RBAC, encryption, and compliance (SOC2, ISO, GDPR/DPDPA)
Excellent troubleshooting, performance tuning, and documentation skills