Position Overview: 

We are seeking a highly skilled and motivated Data Engineer with a strong background in Machine Learning (ML) projects and extensive experience working with Databricks and Apache Spark. As a key member of our data team, you will play a crucial role in designing, implementing, and optimizing data pipelines, frameworks, and infrastructure to support ML initiatives. The ideal candidate should possess a deep understanding of data engineering concepts, excellent programming skills, and a passion for leveraging cutting-edge technologies to drive innovative ML solutions.


Responsibilities:

  • Data Pipeline Development: Design, develop, and maintain robust, scalable, and efficient data
    pipelines using Databricks and Spark to extract, transform, and load both structured and
    unstructured data from various sources.
  • ML Infrastructure: Collaborate with data scientists and ML engineers to build and maintain the
    infrastructure required for ML model training, evaluation, and deployment. This includes setting
    up distributed computing environments, managing clusters, and optimizing resource allocation.
  • Data Quality and Integrity: Ensure data integrity and quality throughout the data lifecycle by
    implementing data validation checks, error handling mechanisms, and data monitoring
    processes.
  • Performance Optimization: Identify performance bottlenecks in data pipelines and Spark jobs
    and work on optimizations to enhance processing speed and efficiency.
  • Data Governance: Implement security protocols, access controls, and data governance policies
    to maintain data privacy and compliance with industry standards and regulations.
  • Collaboration: Work closely with cross-functional teams, including data scientists, software
    engineers, and business analysts, to understand ML requirements, provide data engineering
    support, and deliver successful ML projects.
  • Automation and Scalability: Develop automated solutions for repetitive tasks, and design data
    engineering systems that can scale seamlessly to handle large volumes of data.
  • Troubleshooting and Support: Monitor data pipelines and Spark jobs, proactively identify
    issues, and provide timely support to maintain system availability and performance.
  • Continuous Learning: Stay up-to-date with the latest advancements in data engineering, ML,
    and cloud technologies, and actively apply this knowledge to improve existing systems and
    processes.

Qualifications:

  • Bachelor’s or higher degree in Computer Science, Engineering, or a related field.
  • Minimum of 4 years of relevant experience as a Data Engineer working on ML projects with a
    strong focus on Databricks and Spark.
  • Proven hands-on experience with Databricks and Apache Spark for data processing and ML
    workloads in a cloud environment.
  • Solid understanding of data engineering principles, data modeling, ETL processes, and data
    warehousing concepts.
  • Proficiency in programming languages like Python, Scala, or Java, and experience with SQL for
    data manipulation and querying.
  • Familiarity with cloud platforms such as AWS, Azure, or Google Cloud Platform, and
    understanding of cloud-based data storage and computing services.
  • Prior experience in building and managing ML infrastructure, including ML frameworks (e.g.,
    TensorFlow, PyTorch), model versioning, and deployment.
  • Familiarity with other big data technologies like Hadoop, Hive, or Kafka is a plus.
  • Strong analytical and problem-solving abilities with a keen eye for detail and a proactive
    approach to identifying and resolving issues.
  • Excellent communication and teamwork skills to collaborate effectively with diverse teams and
    convey complex technical concepts to non-technical stakeholders.
Employee Login