We are seeking a Customer Reliability Engineer (OpenStack Private Cloud) to join our team full time at Rackspace.
Applicants in this job role are responsible for the provisioning tools, configuration management systems, and installation/uptime of customer-facing infrastructure and services for Rackspace's various OpenStack Private Cloud related product offerings. We are looking for Software Engineers with a background in operations to design and implement tools that automate the provisioning, configuration, installation, and maintenance of reliable and performant provisioning systems, distributed configuration management systems, and installation and upgrade frameworks with an eventual goal of Continuous Deployment in mind.
- Automation - We take the approach of "If we've done this more than twice, automate it" creating an emphasis on building tools over manual processes.
- Open Source - As the co-founders of OpenStack we have a demonstrated history of working with open source communities and contributing much of our work back.
- Reliability - We have many private clouds that power customer production workloads and they depend on us to understand their environment and keep it running at peak performance for the entirety of our 99.99% uptime guarantee.
- DevOps - Development, and Operations co-exist in our product development teams making them both first class citizens in our infrastructure focus.
- Scale – We support numerous large-scale open source systems on behalf of our customers and are constantly pushing the limits of these systems.
- Extracting and automating all duplicate/manual work contained in service delivery runbooks or playbooks for customer upgrades and maintenances,
- Provide and maintain Framework automatic provisioning /configuration/installation/upgrading of various product offerings in a consistent manner
- Collect and provide metrics for entire product lifecycle process (provisioning, configuration, installation, upgrading) at a granular level
- Automating capacity additions, hardware replacements, operating system upgrades, security updates of the underlying systems with minimal to no impact to the product running on top
- Testing and tuning system configuration for various workloads (Compute heavy, Storage heavy, Network/IO heavy, etc)
- Automate testing of reference architecture deployments of RPC-OpenStack/OSA, Upgrades, moving towards Continuous Deployment