Rackspace

Returning Candidate?

Site Reliability Engineer - OpenStack Private Cloud

Site Reliability Engineer - OpenStack Private Cloud

Req # 
37110
Location(s) 
US-TX-San Antonio
US-TX-Austin
US-Remote
Category 
Openstack Private Cloud, Software Development

Job Overview

Overview & Responsibilities

We are seeking a Customer Reliability Engineer (OpenStack Private Cloud) to join our team full time at Rackspace.  

 

Applicants in this job role are responsible for the provisioning tools, configuration management systems, and installation/uptime of customer-facing infrastructure and services for Rackspace's various OpenStack Private Cloud related product offerings.  We are looking for Software Engineers with a background in operations to design and implement tools that automate the provisioning, configuration, installation, and maintenance of reliable and performant provisioning systems, distributed configuration management systems, and installation and upgrade frameworks with an eventual goal of Continuous Deployment in mind. 

 

Highlights: 

  • Automation - We take the approach of "If we've done this more than twice, automate it" creating an emphasis on building tools over manual processes.  
  • Open Source - As the co-founders of OpenStack we have a demonstrated history of working with open source communities and contributing much of our work back.  
  • Reliability - We have many private clouds that power customer production workloads and they depend on us to understand their environment and keep it running at peak performance for the entirety of our 99.99% uptime guarantee. 
  • DevOps - Development, and Operations co-exist in our product development teams making them both first class citizens in our infrastructure focus. 
  • Scale – We support numerous large-scale open source systems on behalf of our customers and are constantly pushing the limits of these systems. 

 

Responsibilities: 

  • Extracting and automating all duplicate/manual work contained in service delivery runbooks or playbooks for customer upgrades and maintenances, 
  • Provide and maintain Framework automatic provisioning /configuration/installation/upgrading of various product offerings in a consistent manner 
  • Collect and provide metrics for entire product lifecycle process (provisioning, configuration, installation, upgrading) at a granular level 
  • Automating capacity additions, hardware replacements, operating system upgrades, security updates of the underlying systems with minimal to no impact to the product running on top 
  • Testing and tuning system configuration for various workloads (Compute heavy, Storage heavy, Network/IO heavy, etc)  
  • Automate testing of reference architecture deployments of RPC-OpenStack/OSA, Upgrades, moving towards Continuous Deployment 

 

Qualifications

Required Skills: 

  • OpenStack operational experience required 
  • Knowledge of Ansible is preferred 
  • Strong knowledge of low level provisioning technologies (iPXE, Automated Bios Updates, Dracut/Initramfs, In-place reboots/upgrades of the OS) 
  • Strong experience in large fleet automation driven by centralized, versioning configuration/change management 
  • Ability to troubleshoot networking issues in distributed systems 
  • Experience bringing software to production at large scale. 
  • Fanatical focus on automation and instrumentation 
  • Ability to decompose complex systems and understand potential failure scenarios 
  • Contributions to Open Source projects

 

#LI-SR1