Skip To Content

DBTT China Software Development Data Engineer

  • Emplacement
    • Shanghai, Shanghai
  • Horaire Full time
  • Posté

Description

Main Purpose of this position

The data Engineer is responsible for translating business requirements into technology requirements and defining data standards and principles. This position will be creating, developing, and designing data strategy, architectures, and roadmaps for GMPI (General Motors Premier Import) program. The output of this position will support GMPI in data analytics, ML, system development and integrations. In addition, this position shall assume responsibility for defining overall policies and procedures in light of legal and government regulatory frameworks and contractual constraints, as well as designing countermeasures to address risks.

Roles and Responsibilities

  • Quickly develop understanding of GM Premier Import (GMPI) software strategy.
  • Work with or without direct supervision, be visionary in GMPI data strategy and responsible for data architecture design, roadmap and evolution.
  • Ability to quickly develop understanding of business requirements and implement data architecture leveraging modern data technologies.
  • Ensure appropriate and effective data integration across multiple systems, create fluid and E2E vision for how data will flow through the data landscape of organizations.
  • Integrate technical functionality (eg. Scalability, security, performance, data recovery, reliability, etc.) and prepare reports where needed.
  • Responsible for data quality assurance by implementing measures to ensure data accuracy, accessibility and be responsible for regular health check on data sources.
  • Ensure data accessibility, reliability, quality, and security across different organizations.
  • Establish procedures, processes, and perform inventory around data assets.
  • Collaborate with internal & external teams on managing & devise data architecture strategy that address different business requirements
  • Track, anticipate and manage data related technology and architecture evolution.
  • Keep abreast on technology shift and industry initiative to prioritize and adapt the data blueprint effectively.
  • Champion accountability within and outside the team and coordinate dependencies across teams
  • Promote a collaborative team environment that fosters creativity and innovation
  • Promote continuous team improvement, measure the team, and help the team and individuals measure themselves
  • Analyze the current business and IT environment to detect critical deficiencies and recommends areas for improvement

Required Qualifications and Experience

  • Bachelor’s degree in technical discipline (Computer Science or Engineering)
  • At least seven years (Including at college) high proficiency in designing and engineering large-scale data analytical architecture; including building Extract, Transform, Load (ETL) process, data pipelines, data management and integration
  • 5 years or more including at college in data engineer role designing data warehousing and/or data lakes with technologies such as Lakehouse, Hadoop eco-system
  • Demonstrated experience and knowledge in designing, analytics and troubleshooting for large-scale distributed systems in cloud, non-cloud as well as hybrid environments.
  • Strong understanding of distributed systems architectures and micro-service architecture.
  • Facilitate technology strategy, requirements, and architecture conversations with all stakeholders (management, business users, and technology resources) through exceptional collaboration, listening, written and verbal communication skills
  • Creative problem-solver with good communication skills
  • Ability to think strategically about technical challenges, business requirements, and solution
  • Understand country/region regulatory cross-border data requirements and methods for compliance
  • Assess and recommend different storage architecture, such as data warehouse, data lake and data mart based on the data type
  • Design ingestion layer solution with both batch process and event streaming capability

Additional Requirements:

  • Proficient with Oracle, MySQL, MongoDB, Hadoop/Hive/Spark/Flink and other database modeling and management tools.
  • Experience with developing software in one or more programming languages, such as: Python, R, C++, Java, Shell Scripting, JavaScript, HTML/CSS etc.
  • Proficient in using SQL, Hive SQL, Spark SQL, etc. for data warehouse development.
  • Familiar with Linux and Windows.
  • Modern analytics data architectures including cloud native, microservices architecture, virtualization, orchestration, and containerization.
  • Proven expertise in data storage layer modern technology such as Lakehouse architecture.
  • Knowledge in data warehouse versus data lake Experience with data platform development on virtual on-premises cloud provided by Amazon Web Service (AWS), Google Cloud, Azure, or others.
  • Proficient in networking among different data platform application clusters connectivity Data ingestion, data exposure via API.
  • Familiar with Continuous Development/ Continuous Integration (CI/CD) tools and processes.
  • Data visualization tools such as Tableau, Power BI.

Additional Description

RESPONSIBILITIES:
- Communicates and maintains Master Data, Metadata, Data Management Repositories, Logical Data Models, Data Standards

- Create and maintain optimal data pipeline architecture

- You will assemble large, complex data sets that meet functional / non-functional business requirements

- You will identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.

- Build industrialized analytic datasets and delivery mechanisms that utilize the data pipeline to deliver actionable insights into customer acquisition, operational efficiency and other key business performance metrics

- Work with business partners on data-related technical issues and develop requirements to support their data infrastructure needs

- Create highly consistent and accurate analytic datasets suitable for business intelligence and data scientist team members

REQUIREMENTS:
- At least 3 years of hands on experience with Big Data Tools: Hadoop, Spark, Kafka, etc.

- You have mastery with databases - Advanced SQL and NoSQL databases, including Postgres and Cassandra

- Data Wrangling and Preparation: Alteryx, Trifacta, SAS, Datameer

- Stream-processing systems: Storm, Spark-Streaming, etc.

- 7 or more years with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.

- Ability to tackle problems quickly and completely

- Ability to identify tasks which require automation and automate them

- A demonstrable understanding of networking/distributed computing environment concepts

- Ability to multi-task and stay organized in a dynamic work environment


PREFERRED:
- Data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.

- AWS cloud services: EC2, EMR, RDS, Redshift