Senior AWS Engineer

Squarepeg Hires
Washington, DC
Apr 02, 2021
Apr 09, 2021
Engineer, IT, QA Engineer
Full Time
We are working with one of the most important financial institutions who is rapidly growing their engineering team and looking for experienced engineers. They offer competitive salary, benefits, job security, and visa sponsorship. We are seeking Service Reliability Engineers (SRE) with expertise in AWS cloud and DevOps. In this role, you will support the entire development lifecycle to incorporate service reliability best practices and reduce downtime.ResponsibilitiesIndependently determine the needs of the customer while identifying and resolving conflicting orA$ 'A' complementary needs across customer groups.Applying advanced skill, knowledge and experience, design and develop software solutions to meetA$ 'A' customer needs.Use a process-driven approach to leading design solutions.Implement new software technology and coordinate simultaneous implementation tasks across teams.May maintain or oversee the maintenance of existing software.Requirements4+ years of relevant professional experience as a full-stack developer or SRE.Work with application stakeholders and define non-functional requirements covering performance,A$ 'A' scalability, availability, resiliency and reliability including Service Level Objectives, Service Level Indicators and Error Budgets.Develop strategies to address the Non-functional requirements throughout Software or Product DevelopmentA$ 'A' Life Cycle.Work with architecture and development teams in creating performant, highly resilient and reliableA$ 'A' architecture and design using performance engineering & chaos engineering principles.Work with architecture and development teams in implementing resiliency constructs, building faultA$ 'A' tolerance and develop optimal code.Develop tools and utilities to automate manual operational tasks in production.Responsible for incidents related to NFRs, updating SOPs to capture right set of metrics/logs for RCA, RootA$ 'A' cause analysis of the incidents, Solutions identification and Ensure permanent closure of the incidents.Analyze production utilization and incidents patterns, identify improvement areas and implement automationA$ 'A' to improve productivity, avoid manual tasks and recurring incidents.Excellent verbal and written communication skills with experience presenting information and/or ideas to anA$ 'A' audience in a way that is engaging and easy to understand.Experience collaborating cross-functionally on availability / performance issues in order to identify root cause, determine areas for improvement, and drive those actions to closure through effective solutions.Extensive knowledge of principles, advanced techniques, and theories to suggest and implement solutions onA$ 'A' a specific project, program, or product.Influencing skills to include negotiation, persuasion of others, meeting facilitation, and conflict resolution.Adept at managing project plans, resources, and people to ensure successful project completion in an Agile /A$ 'A' Scrum environment in order to facilitate the design / development of performance engineering and resiliency methodologies through collaboration with engineering and product teams to implement shift left techniques on test design & automation.Experience mentoring teams in the writing of Performance and Chaos Engineering strategies and scripts withA$ 'A' a strong emphasis on automated deployment, infrastructure automation solutions, and continuous integration & delivery processes.Skilled as a full stack developer with a focus on cross-platform optimization and responsiveness ofA$ 'A' applications.Strong understanding and knowledge of Java/J2EE technologies and frameworks UI/JavaScript frameworks,A$ 'A' Spring Boot/ Spring Cloud Frameworks, REST, Microservices, server-side frameworks.Experience in working with one of cloud technologies (AWS, GCP or Azure).Knowledge on Cloud technologies and containerization using Docker & Kubernetes.Excellent understanding and demonstrated experience in the use of DevOps/CICD tools like Jenkins, Jules andA$ 'A' Automated deployment tools.Working knowledge on one of Unix operating systems.Automation experience with Blueprism, Selenium, or Ansible play books and programming languages likeA$ 'A' Java, Perl, Python or PowerShell Scripting and Ansible play book.Knowledge on performance tuning of enterprise level Java/J2EE applications (Web and Application ServersA$ 'A' Configuration, JVM parameters tuning, GC and Heap Size, Message Broker).Experience in implementing resiliency design patterns using Hystrix, Resilience4J, Service Mesh or similarA$ 'A' frameworks and validation using chaos monkey type frameworks.Experience in performance engineering tools Monitoring tools, Performance testing tools and Analysis tools.Experience in trouble shooting Performance / Scalability / Availability issues in production environment.Skilled in cloud technologies and cloud computing to include Amazon Web Services (AWS) offerings,development, and networking platforms.Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring,Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations ToilReduction through Automation.Experience designing, building and implementing necessary dashboards from application and infrastructurehealth perspectives using tools such asA$ 'A' development, and networking platforms.Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring,A$ 'A' Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations Toil Reduction through Automation.Experience designing, building and implementing necessary dashboards from application and infrastructureA$ 'A' health perspectives using tools such as Splunk, Dynatrace, Datadog, etc. to provide a single pane view of allA$ 'A' critical business and operational information to relevant stakeholders.#ZRby Jobble

Similar jobs