Overview This is a 24/7 team responsible for production systems health monitoring, deployment of code changes, escalation handling and standardized communication of all change management within the technical operations organization. Multi-task and prioritize system events according to severity and escalation procedures. Quickly and accurately communicates production emergencies, both with internal and external groups. This individual navigates in both Unix and Windows environments and be skilled in actively troubleshooting and/or resolving production issues. Mentor junior staff members and takes lead on outages and dashboard assignments.
Take lead on production outages
Delegate dashboard assignments
Mentor more junior staff members
Responsible for overall daily monitoring - 24x7x365 Health monitoring of Unix and Windows environments hosting various based web, mobile and telephony platforms using server, network and application monitoring systems
Manage real-time escalations and on point for ensuring escalations procedures are in process and are driven to resolution
Handle stressful situations, such as initiating emergency conference bridge calls and sending quick and accurate outage notifications
Create quality control on communications for code releases, schedule maintenances and service interruptions
Monitor the infrastructure change management policies and procedures
Responsible for communicating between departments, vendors and partners as a central repository for information regarding production site, customer support, help desk and core systems issues across the entire organization
Responsible for the deployment/release of engineering code across multiple environments - all builds/releases communicated and applied to staging and production environments according to standard operating procedures
Provide application support for Unix and Windows applications, including performing various system administration tasks and performing standard operating procedures as needed to maintain system health
Work a combination of day, evening and or third shifts as needed
Perform other related duties as required and assigned
Demonstrate behaviors which are aligned with the organization's desired culture and values
Ideal Candidate will have the following:
A bachelor's degree in Computer Science or a related technical field, or equivalent practical system administration and programming experience.
3+ years of previous operations center or equivalent experience
Must be comfortable working in a command line as well as GUI environments
3+ years of direct experience (running scripts, grepping logs, troubleshooting errors)
3+ years of direct Windows experience (running scripts, processing event log messages, troubleshooting errors)
3+ years of direct Vmware Horizon 7 experience (Managing environment)
3+ years of direct Commvault experience (Managing backup environment)
Hands-on experience Amazon Web Services (AWS), Jenkins, and Chef
Hands-on experience Ivanti Patch Management
Knowledge of Docker containerization and Kubernetes/EKS cluster management for container orchestration
Experience programming with at least one language - Powershell, Python, Go, Ruby, or PHP -- and a desire to learn more.
Knowledge of fundamental networking protocols, such as TCP/IP, HTTP, SSL, and DNS, or of Linux system internals.
An understanding of large scale system design, monitoring, and operational practices.
Must be able to accurately report information in a timely manner
Excellent written and oral communication skills
Experience with , ServiceNow, New Relic, SumoLogic, Nagios or Opsview is a required