As a Lead Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE you'll be focused on running better production applications and systems.
This role requires a wide variety of strengths and capabilities, including:
Engineering degree or equivalent experience with work experience more than 15 Years
Understanding of relational and dimensional data modeling
Good experience writing complex SQL's independent of technologies
Experience with Big Data tools and data analysis a plus including HBase, HDFS, Sqoop, Spark
Working experience in Java programming; ideally targeting Kubernetes and AWS experience as well.
Experience in micro services
Experience in Unix shell scripting
Well versed with SDLC using Agile methodology
Excellent oral and written communication skills and the ability to clearly articulate to all project members and stakeholders
Experience with log analysis and monitoring tools such as AppDynamics/Geneos/Splunk and for big data technologies
Excellent debugging and troubleshooting skills on production issues
Require basic understanding and willingness to learn on the job skills around reporting tool like Tableau
Design, code, test and deliver software to automate manual operational work
Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
Perform the L1/L2/L3 support activities in SRE mode for the Production Support project with analysis and design work, including impact of requirements across all system components
Create unit test cases, perform unit testing, document test results and perform code reviews
Identify application patterns and analytics in support of better service level objectives
Design and develop the efforts involved in self-healing and resiliency patterns
Design automated software and product upgrades, change management, and release management solutions
Participate in the 24x7 support coverage as needed
JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.