Key Responsibilities:
Data Ingestion: Design and implement data ingestion pipelines using Databricks and PySpark, with a focus on Autoloader for efficient data processing.
Nested JSON Handling: Develop and maintain processes for handling complex nested JSON files, ensuring data integrity and accessibility.
API Integration: Integrate and manage data from various APIs, ensuring seamless data flow and consistency.
Data Modeling: Create and optimize data models to support analytics and reporting needs.
Performance Optimization: Optimize data processing and storage solutions for performance and cost-efficiency.
Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver effective solutions.
Data Quality: Ensure the accuracy, integrity, and security of data throughout the data lifecycle.
Qualifications:
Technical Expertise: Proficiency in Databricks, PySpark, and SQL. Strong experience with Autoloader and handling nested JSON files.
API Experience: Demonstrated experience in integrating and managing data from various APIs.
Problem-Solving Skills: Strong analytical and problem-solving abilities.
Communication Skills: Excellent communication skills to collaborate with cross-functional teams.
Experience: Previous experience 3- 5 years in data engineering, data integration, and data modeling.
Education: A degree in Computer Science, Engineering, or a related field is preferred.
Preferred Qualifications:
Experience with cloud platforms such as AWS, Azure, or Google Cloud.
Familiarity with data warehousing concepts and tools.
Knowledge of data governance and security best practices.