This job has expired

Systems/Software Reliability Engineer (Multiple Positions - Senior, Mid and Junior)

Employer: Advancement Strategy, LLC
Location: Arlington, VA
Closing date: Sep 17, 2019

Industry: Engineering
Function: Engineer, Software Developer, Systems Administrator, IT
Hours: Full Time
Career Level: Experienced (Non-Manager)

This role requires a strong level of experience performing site reliability engineering and operational management that includes software development using multiple languages and operating systems. Your expertise in analyzing and troubleshooting complex IT ecosystems that includes the design of large-scale distributed systems is a key asset in our development of software oriented for systems or network automation. Having the technical skills to debug, optimize, and automate the code of routine tasks using a systematic problem-solving approach is critical. The successful candidate must have effective communication skills and be a strong team player. In this role you will participate as a member of our clients' internal Infrastructure team and will: Use your technical acumen and thought leadership to create infrastructure-as-a-service assets for our clients. Use your development experience to contribute to our clients' internal Infrastructure engineering teams that develop, deploy, and operate these services. Use your operational knowledge of distributed systems to improve the consistency, reliability, and performance of our client ecosystem of technology services. Key Attributes: Advise: Provide thought leadership to multiple client engineering teams in designing and developing systems that are resilient and high performing on a global scale. Automate: Build tools and systems to support the operation of infrastructure and services. Optimize: Surveil and improve performance, reduce cost, and improve end-user experience. Diagnose: Use your knowledge of distributed systems to identify and fix the network, system, and service-level issues. Required Qualifications: 5-10 years' experience in large scale systems or software engineering environments 3+years' experience in infrastructure, operations or site reliability engineering Extensive experience in the development, operation, and management of high-traffic frontend and backend systems Strong experience in three or more of the following: C, C++, Java, JavaScript, Python, Go, Perl, Ruby or SQL with a strong working knowledge of Linux, Unix and TCP/IP Strong experience with algorithms, data structures, complexity analysis, and software design Experience using Software or Product life cycle methodologies and tools Experience designing, analyzing and troubleshooting large-scale distributed systems Experience debugging, optimizing code to automate routine tasks or RPA Experience troubleshooting applications, networking (TCP/IP) and distributed systems Outstanding communication and collaborative skills as a team player Skilled at engaging in technical thought leadership discussions to improve the lifecycle of services-from inception through design, deployment, operation, and refinement Skilled at supporting production services pre-go-live activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews Skilled at maintaining production services post-go-live activities such as measuring and monitoring availability, latency, and overall ecosystem health Know how to plan the scale of systems sustainably through mechanisms like RPA and evolve systems by pushing for changes that improve reliability and velocity Practitioner of sustainable incident response and postmortems processes Occasionally work and travel to multiple client locations globally (~20%) Ability to attend video meetings, complete and deliver assigned work products remotely (~80%) BS or MS degree in Computer Science, Software Engineering, Information Systems or related technical field involving code level engineering Additional Qualifications Experience working in an environment that applies Infrastructure-as-code principles Experience with Infrastructure-as-code processes (via Terraform, CloudFormation, etc.) Experience with Docker or Kubernetes in a production environment Operational knowledge of Amazon Web Services or Google Cloud Platform Operational knowledge of Postgres or Cassandra Operational knowledge of Configuration Management Systems (Puppet, Chef, Salt, etc.)

Send job

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert