UST is a leading provider of platforms, digital innovation, artificial Intelligence and end-to-end IT & Business services and solutions for Global 1000 companies. We are transforming corporations through deep domain expertise, knowledge-based ML platforms, as well as profound anthropological efforts to understand the end customer and design products and interactions that create delight. We are deeply committed to developing a comprehensive understanding of our clients' problems and to develop platforms to address them.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Customer's services-both internally critical and externally-visible systems-have reliability, uptime appropriate to user's needs and a fast rate of improvement. Additionally, SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of the engineering focuses on optimizing existing systems, building infrastructure and eliminating work through automation.
On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving, and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
Collaborate with software engineers in development, test, as well as CI infrastructure teams with the aim of improving the team's CI and CD services.
Partner with development teams by providing infrastructure assistance and guidance from the early phases of product development, including the development of software and processes to assist developers in infrastructure-related workflows (such as build, release and deployment automation).
Designs and implements infrastructure for new and existing products, ensuring all business policies for security, supportability and cost are met, while enabling efficient deployment of products through automated means.
Participates in project planning discussions to include the formulation and delivery of cost and labor estimates and options for assigned projects.
Designs and implements solutions to provide continuous integration, automated deployment, and configuration management of internally or externally developed applications.
Analyze new and existing products for performance and efficiency improvements, both as part of a structured release process, and as an ongoing process.
Monitor and tune the performance, reliability, and security of the infrastructure. Identify and correct bottlenecks in the system, while working with development teams on optimization and best practices.
Integrates internally developed products, externally developed products, and mixtures of both, to create working solutions from multiple disparate parts.
SUMMARY OF REQUIREMENTS
Experience with cloud services such as RH-Openshift, AWS, Azure, Google and On-prem DCs.
Strong programming and scripting knowledge, e.g. Groovy, Python, Ruby, PowerShell, Bash, Ansible.
Strong knowledge & understanding of tools/skills such as Hashicorp Vault, Apigee, Dynatrace, Kubernetes, Docker, Kafka, Kinesis, Sysdig, Cloud Watch, Cloud trail, Lambda, SQL / Postgres, Info Sec best practices, software security
Support and improve our tools for continuous integration, automated testing, automation and release management making the entirety of software engineering as efficient and effective as possible.
Continuous Delivery pipelines implementations
Monitoring and logging systems (e.g. Splunk, ELK)
Understand best practices for source control, build engineering, continuous integration and deployment.
Proficiency in the setup, configuration, maintenance, and upgrading of one or more server operating system families (Linux, Windows, etc)
Proficiency with server prototyping and virtualization tools.
Proficiency with version control tools (Bitbucket, Git)
Experience with SDLC processes (code review, release management, etc) and automation of same (continuous integration, continuous factory delivery)
Experience with networking protocols (TCP/IP, SSL, etc)
Soft skills in Tenacity,Communication, Troubleshooting (with real world examples of vexing problems), Tolerance for frustration working across disparate locations / time zones, Proactiveness (seeing something and saying or better yet, doing something
Bachelor of Science degree in Computer Science, Computer Engineering, Electrical Engineering, Information Technology, Information Systems, Industrial Engineering, or related field; or equivalent combination of education and experience.