Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do.
As a Senior Site Reliability Engineer you will be responsibility for performance, reliability and availability of critical applications for Silicon Valley Bank.
Skills and requirements:
Be part of the team that owns the availability, performance and reliability of customer deployments'
Drive adherence to SLAs through monitoring, alerting and scaling
Deploy, maintain, support and troubleshoot critical, large-scale customer infrastructure deployments in private and public cloud
Dive deep into issues and outages to establish root causes and communicate them to your business partners
Design and document automated procedures
Partner with the Security team to ensure confidentiality, integrity and availability of customer data and deployments
The ideal candidate will have experience and qualifications for planning and managing operations infrastructure, including:
Experience planning and executing site deployments (AWS, private cloud).
Expertise automating system administration tasks with scripting tools (Python or shell preferred).
Aptitude for analyzing and troubleshooting operating system, networking, configuration and performance problems.
Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP.
Ability to install, configure and maintain Linux hosts and popular open source applications such as Nginx, Apache HTTPd, Apache Tomcat, Postfix, and MySQL server.
Experience with monitoring and automation tools such as Ansible, Splunk, Zabbix, etc.
Ability to communicate clearly with both technical and non-technical staff.
Familiar with system hardening and server security best practices.
A bachelor's degree is required, preferably in Computer Science, Software. Engineering, or other related engineering discipline.
AWS Certified with 3 years of hands on extensive experience in AWS Cloud Operations and experience in design & implementation of complex distributed applications and infrastructure.
5 years of real work deployment experience in core infrastructure technologies including compute, storage, networking, databases, security, and management.
For the last 2 years, hands-on experience with deploying cloud solutions such as AWS and others.
Understand performance and availability requirements; working with Software Engineering to define deployment, configuration and monitoring requirements.
A strong working knowledge of Linux variants
Experience maintaining complex systems in a cloud environment
Ability to create meaningful metrics and alerting for service health monitoring
Reducing manual effort through automation with scripting or programming languages
Skilled with configuration management and automation frameworks
Proficiency driving Root Cause Analyses to meaningful improvements
Leading troubleshooting efforts with production/non-production systems.
Participating as part of a 24x7 on call rotation
Experience working in a high-growth environment
Extensive AWS, Kubernetes skills
Cybersecurity experience (e.g. Infrastructure, application, system or compliance)