Senior Data Engineer (Jaipur, India)

Location

Jaipur

Experience

6-8 years

Industry

AI/Technology

Job Summary

Design, build, and optimize large-scale, production-grade data pipelines and analytics platforms on Azure, leveraging Databricks, Synapse, and the broader Microsoft data ecosystem. Deliver business-critical data assets for analytics, BI, and AI/ML initiatives.

Key Technical Responsibilities

  • Architect modern data lakes using Azure Data Lake Storage Gen2 for batch and streaming workloads.
  • Build and maintain scalable ETL/ELT pipelines using Azure Data Factory and Databricks (PySpark, Scala, SQL).
  • Orchestrate data workflows across ADF, Databricks, and Synapse Pipelines; implement modular and reusable data pipeline components.
  • Develop advanced notebooks and production jobs in Azure Databricks (PySpark, SparkSQL, Delta Lake).
  • Optimize Spark jobs by tuning partitioning, caching, cluster configuration, and autoscaling for performance and cost.
  • Implement Delta Lake for ACID-compliant data lakes and enable time travel and audit features.
  • Engineer real-time data ingestion from Event Hubs, IoT Hub, and Kafka into Databricks and Synapse.
  • Transform and enrich raw data, building robust data models and marts for analytics and AI use cases.
  • Integrate structured, semi-structured, and unstructured data sources, including APIs, logs, and files.
  • Implement data validation, schema enforcement, and quality checks using Databricks, PySpark, and tools like Great Expectations.
  • Manage access controls using Azure AD, Databricks workspace permissions, RBAC, and Azure Key Vault integration.
  • Enable end-to-end data lineage and cataloging via Microsoft Purview (or Unity Catalog in multi-cloud environments).
  • Automate deployment of Databricks assets (notebooks, jobs, clusters) using Databricks CLI/REST API, ARM/Bicep, or Terraform.
  • Build and manage CI/CD pipelines in Azure DevOps for data pipelines and infrastructure as code.
  • Containerize and deploy custom code using Azure Kubernetes Service (AKS) or Databricks Jobs as required.
  • Instrument monitoring and alerting with Azure Monitor, Log Analytics, and Databricks native tools.
  • Diagnose and resolve performance bottlenecks in distributed Spark jobs and pipeline orchestrations.
  • Collaborate with data scientists, BI engineers, and business stakeholders to design and deliver scalable data solutions.
  • Document design decisions, create technical specifications, and enforce engineering standards across the team.

Required Skills & Experience:

  • Hands-on with:
    • Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Databricks
    • PySpark, SparkSQL, advanced SQL, Delta Lake
    • Data modeling (star/snowflake), partitioning, and data warehouse concepts
  • Strong Python programming and experience with workflow/orchestration (ADF, Airflow, or Synapse Pipelines)
  • Infrastructure automation: ARM/Bicep, Terraform, Databricks CLI/API, Azure DevOps
  • Deep understanding of Spark internals, cluster optimization, cost management, and distributed computing
  • Data security, RBAC, encryption, and compliance (SOC2, ISO, GDPR/DPDPA)
  • Excellent troubleshooting, performance tuning, and documentation skills

Join Our Team

We are looking for passionate and innovative individuals ready to make a meaningful impact. Our collaborative and inclusive work environment values your ideas and fosters professional growth. Be part of a dynamic team that celebrates success and strives for excellence in every endeavor. Shape the future with us – explore our diverse range of roles and embark on a rewarding journey.