Saltar al contenido principal

Staff Software Engineer - Site Reliability and Observability

  • Ubicación
    • Austin, Texas
    • Roswell, Georgia
    • Warren, Michigan
  • Cronograma Full time
  • Publicado
  • Job Requisition JR-202510222

Descripción

Work Arrangement:

Hybrid: This role is categorized as hybrid. This means the successful candidate is expected to report to either Austin, TX or Atlanta, GA at their respective innovation centers three times per week.

The Role:

The Software Engineering Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of software systems. Their job profile includes:

  • System Monitoring and Troubleshooting: Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents.
  • Automation and Infrastructure: Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring.
  • Performance Optimization: Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems.
  • Incident Response and Root Cause Analysis: Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future.
  • Collaboration with Development Teams: Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation.
  • Continuous Improvement: Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.

[Additional Description]

What You'll Do

  • Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a holistic view of the environment.
  • Deliver tools/software to improve the reliability, scalability and operability of services.
  • Collaborate with engineering teams to analyze and provide inputs in architecture, infrastructure resources, observability to achieve reliability and scalability goals.
  • Collaborate with engineering teams to conduct production readiness reviews, deployment, operation and refinement.
  • Partner with stakeholders to ensure data and observability tools are effectively integrated with other systems and processes.
  • Partner with stakeholders to identify, measure and monitor availability, latency and overall service health.
  • Participate in on-call engineering duty to support production. 
  • Instill Site Reliability best practice through automation, data insights, and observability
  • Perform initial incident root cause analysis with engineers, carryout incident postmortem.
  • Build run books, tooling to carry out production support activities.
  • Actively participate in technical discussions and deep dives with Architectural group

Your Skills & Abilities (Required Qualifications)​​  

  • 7+ years of hands-on SRE experience (software development, systems monitoring) with at least one of the public cloud providers – Azure (strongly preferred), AWS, GCP 
  • Experience operating high-availability, fault-tolerant, scalable, distributed software in production: Building monitoring, defining alerts, writing run books, establishing dashboards etc. 
  • Experience with monitoring and log aggregation frameworks, such as Azure Monitor/Sentinel, Datadog(preferred), Dynatrace, Elasticsearch, Kibana, Logstash. 
  • Strong working knowledge of Docker, Kubernetes, Terraform, Chef or Ansible 
  • Experience troubleshooting JVM based applications. 
  • Chaos engineering implementation and experience a big plus. 
  • Extensive knowledge Infrastructure as a code tool Terraform 
  • Extensive knowledge of Trace monitoring, installation and configuration of Open telemetry. 
  • Strong experience in scripting/programming – Python, Java, Go, PowerShell, Bash. 
  • Experience with configuration and management of SSO, Big Data/ No-SQL in cloud infrastructure. 
  • CI/CD automation frameworks knowledge - Jenkins/Azure DevOps 
  • Strong understanding of public cloud networking components. 
  • You have a story to tell how you lead and influence cross-organization effort to improve uptime to at least 99.99% 
  • Working experience with source control management tools, such as GitHub (Preferred), Azure Devops 
  • Experience with IoT stack is a big plus 
  • BS/MS in Computer Science/Engineering preferred

This job may be eligible for relocation benefits.

A company vehicle will be provided for this role with successful completion of a Motor Vehicle Report review.

#LI-KB1

Información sobre diversidad

General Motors se compromete a ser un lugar de trabajo en el cual no solo no haya discriminación indebida, sino que fomente con sinceridad la inclusión y el sentido de pertenencia. Creemos firmemente que la diversidad del personal crea un entorno en el cual nuestros empleados pueden prosperar y desarrollar mejores productos para nuestros clientes. Instamos a los candidatos interesados a que revisen las responsabilidades y aptitudes clave para cada puesto y se postulen para los puestos que coincidan con sus habilidades y capacidades. Es posible que, cuando corresponda, se les pida a los solicitantes que están en el proceso de contratación que completen satisfactoriamente una o más evaluaciones relacionadas con su función y/o una evaluación previa al empleo antes de comenzar a trabajar.  Para obtener más información, visite Cómo contratamos.

Declaración de igualdad de oportunidades en el empleo (EE.UU.)

General Motors se enorgullece de ser un empleador que ofrece igualdad de oportunidades.  Todos los solicitantes calificados serán tenidos en cuenta para el empleo sin distinción de raza, color, religión, sexo, orientación sexual, identidad de género, nacionalidad, discapacidad o condición de veterano protegido. 

Adecuaciones (EE.UU. y Canadá)

General Motors ofrece oportunidades a todos los solicitantes de empleo, incluyendo las personas con discapacidades. Si necesita una adecuación razonable para ayudarle con su búsqueda o solicitud de empleo, envíenos un correo electrónico a [email protected] o llámenos al 800-865-7580. En su correo electrónico, incluya una descripción del puesto específico que está solicitando, así como el título del empleo y el número de solicitud del puesto que está solicitando.

 

Quienes somos

Two GM employees talking in hallway

Our Culture

Working at GM

Driven by innovation and creating an environment to inspire, we embrace the responsibility to make our world better, safer and more equitable for all

A mother and two children spend quality time together in their backyard after work

Total Rewards

A better tomorrow begins with you

From day one, we’re looking out for your well-being— at work and at home— so you can focus on realizing your ambitions

Únete a la comunidad de talentos

Conozca las próximas oportunidades profesionales y eventos en Pendulum

Únete ahora
A scene of people in an office

Únete a la comunidad de talentos

Somos ambiciosos. Estamos comprometidos. Y traemos la pasión de la vida al trabajo. Comuníquese con nosotros para obtener más información sobre cómo comenzar su carrera en GM.

Buscar Oportunidades de Carrera Únete a nuestra Comunidad de Talento
Join Talent Community
Untitled Design (19) 645E57fc Fd25 46Bf Bfa8 Dc117a78bd0a EQ100 Color 082624 White Linkedin Top Companies 2025 White Ribbon Generated Image Vets Indexes 2024 White 38908 Seramount 2024 Badges Outlines 100 Best Company White 05 14 2024 Diversity Inc Top 50 Logo White