Site Reliability Engineer Location: Salt Lake City, UT Site Reliability Engineer will work with the Trust Office team and is responsible for leading efforts across the organization within dynamic platform and service performance availability controls. Our platform runs with at an incredibly high percentage uptime and is mission critical for our business operations and customer success. Be a critical member of our team by managing our platform's and product service's continuous availability to keep our environment reliable and resilient.
Provides technical expertise in all aspects computing and networking uptime calculations, performance assurances, and elastic redundant systems. This position will engineer, create, coordinate projects, document, present, and processes platform and product data to ensure high infrastructure uptime and services performance availability.
This role is responsible for the infrastructure availability management, resiliency calculations, network architecture design, software engineering in cloud environments, configuration management, automation development, and statistical modeling.
As a Site Reliability Engineer, a Typical Day Might Include the Following:
Develop reliability and resilient calculable models to sustain enterprise infrastructure availability.
Oversee and manage platform and product uptime reporting program.
Work with system and net operations and product teams to develop and deploy statistical models to establish monitoring and alerting strategy.
Develop and ensure the scalability, performance, and resiliency of multiple hosting platforms and product lines.
Build, test, and deploy multi-levels of automation and replication controls to seamlessly deliver fault-tolerant availability of datacenter and cloud environments.
Restore healthy operation of platform functionality, applications, and services through sustainable incident response operations.
Design and implement server provisioning and processing scripts for secure, reliant, and continuous data-flow.
Conduct and manage reliability and system performance testing; work closely with risk management teams to conduct reliability, resilient regression, and replication testing.
Analyze, troubleshoot, and solve product performance concerns to provision continuous and sustained customer-to-product availability.
Maintain the highest level of personal certification, integrity and objectivity, following the company Code of Ethics and Nice CXone policies and procedures at all times.
To Land This Gig You'll Need:
Bachelor's degree in Business Information Systems, Economics, Statistics, Computer Engineering, Computer Science, Information Systems Security or related field or equivalent work experience required.
5+ years of site reliability engineering experience.
3+ years in or with cloud information systems replication and business continuity operations.
Experience building fault-tolerant system and application performance calculations across cloud environments.
Extensive experience with Chef, Docker, and Kubernetes.
Extensive experience coordinating with multi-disciplinary engineering teams throughout incident response operations.
Excellent skills in risk assessment processes, policy development, proposals, work statements, product evaluations, and delivery of software.
Demonstrable skills in innovating with intent to improve reliability and efficiency.
Experience analyzing and documenting post-mortem results; strong written skills are a must to deliver effective communication content throughout the company.
Strong software development and deployment skills to automate continuous availability cloud requirements.
Demonstrates a strong ability to follow best practices within software development lifecycles, to include integrated security and configuration testing through continuous implementation frameworks.
Expertise building and deploying in Azure and AWS.
Experience developing and consuming web services through REST and SOAP APIs.
The attributes of a qualified candidate are a rational skepticism, a sense of risk appreciation, technical awareness, informed judgment and a strong operational understanding.
Certifications in security, site reliability engineering, or related field (one or more preferred):
DevSecOps Engineering (DSOE)
Continuous Delivery Architecture (CDA)
AWS Certified DevOps Engineer - Associate or Professional
Experience building and configuring cloud architectures, including scalability and reliability frameworks.
Strong knowledge of writing Web Service APIs
Strong understanding of database (SQL, MySQL) administration and software (Java, Angular, and C++) technologies.
Strong understanding of SaaS and Contact Center/Telecommunication services.
Strong understanding of TLS 1.1, or above, and PKI security controls.
ABOUT NICE CXone: NICE CXone makes it easy and affordable for organizations around the globe to provide exceptional customer experiences while meeting key business metrics. NICE CXone provides the world's #1 cloud customer experience platform, NICE CXone CXone™, combining best-in-class Omnichannel Routing, Workforce Optimization, Analytics, Automation and Artificial Intelligence on an Open Cloud Foundation. NICE CXone is a part of NICE (Nasdaq: NICE), the worldwide leading provider of both cloud and on-premises enterprise software solutions.