Rackspace

Returning Candidate?

Site Reliability Engineer

Site Reliability Engineer

Req # 
35570
Location(s) 
US-TX-Austin
Category 
System Administration / Engineering

Job Overview

Overview & Responsibilities

ObjectRocket is a young company with big goals. We want to build the next generation of Database as a Service, and we need your help. We need folks who want to build something that hasn't been done yet, is hard, yet fantastically rewarding.

 

We are located in the heart of beautiful downtown Austin, TX. Austin is highly regarded as a wonderful place to live, work, and play. It's the Live Music Capital of the World and has a serious night life.

 

ObjectRocket has a fast paced, and exciting culture. We are a small team, and move quick. We are building something quite amazing and look to be leaders in our field and community. We are growing like crazy, we need more help!

 

-------------------------------------------------------------------------------------------------------------------------------

 

Job Description

 

As a Site Reliability Engineer, you will work with other SRE, Engineers and Developers to ensure maximum performance and availability of our database services and infrastructure.  Our Site Reliability Engineer is someone who is familiar with both software and systems engineering with a desire not to just resolve the problem but prevent it in the future.

 

 

Responsibilities:

 

  • Design and architect operational solutions for managing applications and infrastructure, across data centers and cloud providers with the specific goal of increasing the automation, repeatability, and consistency of operational tasks.
  • Design new tools to monitor and alerting that help discover failures in a timely fashion while working with engineers to identify root cause and fix issues
  • Provide basic network administration and troubleshooting.
  • Support and perform maintenance across product and data environments/systems
  • Create scalable alerting and auto remediation systems.
  • Capacity planning for various services
  • Design, write and deliver monitors and dashboards that improve predictability and are actionable in a proactive manner.
  • Day-to-day operational management, including response, incident, event and problem management activities along with tier two support.
  • Participate in on-call rotation duties.

 

 

Qualifications

Qualifications:

 

  • Experience with Linux systems administration and tuning: Requires a minmum of 7 years Linux Systems Experience, including systems administration or engineering experience. 
  • Understanding of TCP/IP networking.
  • Experience in one or more of: Python, Ruby, Go.
  • Experience with automation tools: Puppet, Chef, Docker, Jenkins and/or Ansible
  • Understand and have implemented Docker and other container based systems
  • Strong passion for automation, testing and code quality
  • Experience with public cloud providers (AWS, Azure, Google Compute.)
  • Comfort with collaboration, open communication and remote teams

Bonus points

  • Has experience using Prometheus and Grafana.
  • Experience with cluster managers like Mesos or Kubernetes.
  • You think of infrastructure and automation as code