Job Responsibilities:
- Provide design and implementation expertise to a cross-functional software development team.
- Design and develop software applications from business requirements in collaboration with other team members.
- Develop scalable and re-usable frameworks for ingestion and transformation of large data sets.
- Work with other members of the project team to support delivery of additional project components (API interfaces, Search).
- Work with Batch processes in Databricks.
- Work within an Agile delivery / DevOps methodology to deliver proof of concept and production implementation in iterative sprints.
- Convert existing Informatica ETL code to Databricks code whenever it is feasible/required.
- Create and maintain Databricks queries to support dashboarding and/or reporting activities.
- SQL query development and optimization as required to support various reporting needs.
- Performance monitoring and diagnosis of the ingest pipeline and suggest continual improvement.
Minimum Requirements:
- At least two (2) years of experience with Databricks
- Expertise in designing and deploying data applications on cloud solutions, such as Azure or AWS
- Comprehensive understanding of data management best practices including demonstrated experience with data profiling, sourcing, and cleansing routines utilizing typical data quality functions involving standardization, transformation, rationalization, linking and matching.
- Experience in building ETL / data warehouse transformation processes
- Hands on experience in performance tuning and optimizing code running in programming languages such as PySpark and Python
- Good understanding of SQL, T-SQL and/or PL/SQL
- Demonstrated analytical and problem-solving skills particularly those that apply to a big data environment
- Experience working with structured and unstructured data
- Experience working in an Agile environment
- Experience of working with relational databases: (SQL Server, PostgreSQL)
Preferred experience:
- Experience with non-relational / NoSQL data repositories (incl. MongoDB, Cassandra, Neo4J)
- Design and implement data ingestion pipelines from multiple sources ensuring the quality and consistency of data is always maintained.
- Work with event based / streaming technologies to ingest and process data.
- Experience working in a command line environment and a general understanding of Red Hat Linux OS, or other Unix-like OS.
Job Type: Full-time
Pay: $150,000.00 - $155,000.00 per year
Benefits:
- 401(k) matching
- Dental insurance
- Flextime
- Health insurance
- Health savings account
- Paid holidays
- Paid time off
- Vision insurance
Compensation package:
Experience level:
Schedule:
Application Question(s):
- Do you have GreenCard or US Citizenship?
- Do you have Public Trust Clearance?
Experience:
- Databricks: 2 years (Preferred)
Work Location: Remote