As a Site Reliability Engineer, you will work with other SREs, Engineers, Developers and our support & operations teams to ensure maximum performance, reliability and automation of our Managed Kubernetes deployments and infrastructure on top of Azure / AWS.
We recognize that manual approaches to operations do not scale, and have a dedicated team of Site Reliability Engineering to tackle the significant problems of managing many, discrete Private Cloud and Public Cloud Kubernetes deployments with multiple offerings and form-factors at scale world-wide.
Our Site Reliability Engineer is someone who is familiar with both software and systems engineering with a desire not to just resolve the problem but prevent it in the future. You should have excellent written and verbal communication skills and you should be comfortable operating in fast paced environment.
You will be working with many new and cutting-edge technologies, such as Kubernetes, Docker & LXC containers, software defined networking, security tools, and other Cloud Native Compute Foundation projects as well as our extended platform support for Managed Kubernetes on top of Azure / AWS.
In addition to resolving and automating issues internally and downstream if a problem, or issue is better served by fixing the issue in the upstream Open Source code, you will be submitting patches to improve the operational and reliability aspects of the upstream projects.