Site Reliability Engineer

GovernmentCIO LLC
Washington, DC
Oct 23, 2021
Oct 25, 2021
Engineer, IT, QA Engineer
Full Time
Overview Job title: Site Reliability Engineer Senior App Monitoring As a Site Reliability Engineer Senior, you will be focused on establishing and improving monitoring to measure end-to-end performance and end-user availability of systems via a suite of common monitoring tools. You will interface with business partners and operations teams to develop business and technical monitoring requirements. You will work with teams within Enterprise Command Center (ECC) to assist with development and implementation of monitoring to meet business requirements, including KPIs, service mapping, dependency mapping, alerting thresholds, etc. You will be working with other site reliability engineers and dedicated monitoring engineers to support this initiative. Responsibilities * Work with application owners, both Business owner and operations technical teams, to establish Business and Technical monitoring strategies, including instrumentation of the systems, collection of metrics, development of KPIs, and configuration of alerting by static and dynamic thresholds through use of statistical analysis and machine learning. * Utilize technical area expertise to assess, select, manage, and implement enterprise application components, and to ensure that the technical solution solves the business problem as an organic part of the organization's operational and functional baseline. * Participate in the support of Major Incidents with Major Incident Management (MIM), Operations Triage Group (OTG), ECC, and Problem Management (PM) throughout the major incident life cycle by providing monitoring data on the system(s) in question and by addressing deficiencies in technical and business monitoring KPIs. * Support Triage efforts during Major Incidents by deconstructing application performance, interoperability, instrumentation, and human factors to facilitate resolution and development of resilient solutions. * Support PM's enterprise root cause analysis (RCA) processes in collaboration with appropriate OI&T organizations. * Capture technical information from the relevant stakeholders and synthesize it into useful information in various formats for OIT senior management and other VA components. * Demonstrate proficiency with DevOps tools, JIRA, ServiceNow, MS Project and perform tasks using the tools. Qualifications Education and Experience: * Master's Degree is preferred in Business Administration, Business Management, Computer Science, Information Systems, Information Resource Management, Industrial Engineering, Operations Research, or related fields * 5+ years of relative experience * Certifications in relevant software development or analytics plus 3-5 years of relevant experience * 8 to 10 years of relevant experience may be substituted for education (13-15 years total) Skills: * Experience working with Business and Technical leaders to develop KPIs for application monitoring. * Experience with modern performance monitoring and diagnostics tools (examples: Splunk, Splunk ITSI, AppD, Dynatrace, SolarWinds, etc.) * Be a technical expert with expertise across multiple technology areas and the ability to diagnose complex issues throughout many technologies and apply this knowledge to effective monitoring of applications. * Must be able to provide oral and written discussion of analytical findings using narrative and graphic forms. * Must be able to use qualitative and quantitative analytical skills to assess the effectiveness of the operations. * Identifying symptoms for process improvement. * Analytical and investigation, and organization skills * Communications including being able to craft content for executive level presentations. * IT background and ability to understand technical content. We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. #LI-KW1 #LI-Remote PI149507325

Similar jobs