Manager, Site Reliability Engineering - University Information Services (UIS)

7 days left

Washington D.C
Apr 05, 2019
Jul 27, 2019
Full Time
Located in a historic neighborhood in the nation's capital, Georgetown offers rigorous academic programs, a global perspective, exciting ways to take advantage of Washington, D.C., and a commitment to social justice. Our community is a tight knit group of remarkable individuals interested in intellectual inquiry and making a difference in the world.


The Site Reliability Engineering Manager - University Information Services (UIS) is responsible for the management, planning, operations and business continuity of all servers, databases, High Performance Computing (HPC) and research computing managed by University Information Services and is responsible for designing, developing, integrating, deploying, maintaining and supporting computer technologies to deliver enterprise and research applications. This position will take the lead in developing and implementing technical policies and procedures for managing enterprise wide systems applications and will work with other organizations within UIS and outside of UIS to ensure that appropriate policies and procedures are in place and adhered to.

In addition to the above, The Site Reliability Engineering Manager will be one of the Incident Managers in UIS and will be responsible for resolution of any large scale outage impacting the university computer resources. This position is also part of Change Control Board in UIS and is responsible for approval of all production change for systems managed by UIS. This position will support and engage with the DevSecOps teams to help UIS staff complex projects with skills unique to this team. This position reports to the Chief of IT Operations. Reporting to this position are 8 full time staff.

Personnel and Department Management
  • Lead the creation of staff development plans for transitioning skills from traditional systems management to code-based operations consistent with DevSecOps and Site Reliability Engineering best practices.
  • Manage (support, supervise, mentor, evaluate, train) the infrastructure team that supports the servers and research computing systems.
  • Establish policies and procedures for effective and efficient operations.
  • Manage the budget and maintain vendor contracts.
  • Evaluate and maintain the resource plan for IT Operations related to servers, databases and research computing.
  • Support the development and ongoing needs of cross-functional teams supporting DevSecOps needs throughout the organization including projects which may not directly contribute to the mission of the server team.

Technical Engineering
  • Perform systems administration tasks to ensure systems are properly secured and maintained.
  • Oversee the VMWare, Linux, Windows, Load Balancer, Cloud (AWS/GCP) systems and repair, replace and improve as needed.
  • Develop and maintain service level agreements and memorandum of understanding documents as needed to ensure the team is meeting expectations for sustaining University IT systems.

  • Actively participates in setting strategic direction including prioritization, planning and alignment of project to the University mission.
  • Works with Chief of Operations, Chief Engineer, Enterprise Architect and Chief Information Security Officer to assure institution-wide support for and understanding of plans, goals and objectives for University Information Services.
  • Provide clear, timely and appropriate communication within UIS and with other levels in of the institution.

Policy Promotion and Management
  • Promote awareness to various departments and researchers to work on providing secure and scalable infrastructure.
  • Works with various departments to define criteria for product development, establishing and enforcing configuration management and change control procedures and policies and manage and enforce change control policies and procedures.

  • Bachelor's degree or equivalent with relevant course work in software engineering, programming or related field(s). Master's degree in software engineering preferred
  • 5-10 years' experience in systems operations and engineering. Requires excellent knowledge of systems administration (esp. Linux/Windows) and a thorough understanding of IP networking
  • In-depth knowledge of security policies and procedures and network systems appropriate to large-scale installations in university environments is essential
  • 2-3 years' experience with supporting cloud-based infrastructure (AWS or Google) is required

Preferred Qualifications:
  • Graduate degree highly desirable
  • Experience with high performance research computing systems a plus

Similar jobs