Remote Principal Data Engineer (Graph Database)

S&P Global
Washington, DC
Dec 05, 2022
Dec 07, 2022
Full Time
A$ A' "A' ?Position SummaryWe are looking for an adept, action-oriented Principal Data Engineer to operationalize our knowledge graph database to enable our soon-to-be-launched digital transformation product which uses advanced NLP, knowledge engineering, and ML to accelerate innovation in engineering, manufacturing, and scientific operations. The perfect candidates will have strong data infrastructure and data architecture skills, a proven track record of collaborating and iteratively implementing data-intensive solutions, strong operational skills to drive efficiency and speed, strong project leadership, and a strong vision for how data engineering can proactively create positive impact for companies. You will be a part of an early-stage team. You will educate stakeholders, mentor team members, and have a significant stake in defining the future of the Data Engineering function for the product.Job ResponsibilitiesOptimize and maintain the deployment of the knowledge graph database and surrounding services in an AWS cloud environment leveraging Kubernetes and IaC (Infrastructure as Code, declarative infrastructure)Work closely with data scientists, micro-service developers, and security experts to build out a big data platform incrementally and securelyWork closely with the product management and development teams to rapidly translate the understanding of customer data and requirements to product and solutionsMaintain an excellent understanding of the business's long-term goals and strategy and ensures that the design and architecture are aligned with theseDefine and manage SLA's for data sets and processes running in productionDesign for disaster recovery balancing availability and consistency in multi-region scenariosResearch and experiment with emerging technologies and tools related to big dataEstablish and reinforce disciplined software engineering processes and best-practicesIdeal QualificationsExperience working with knowledge graphs stores (Stardog, TigerGraph, Ontotext GraphDB, Neo4j) and surrounding semantic technology (OWL, RDF, SWRL, SPARQL, JSON-LD)Comfort and ideally substantial experience operating big data infrastructure in a cloud-based ecosystem (AWS preferred)Deep understanding of the theoretical and practical tradeoffs of various data formats in object/file stores (Parquet, Avro, JSON, etc.) in combination with a variety of ETL tools (Spark, Presto, etc.)Deep understanding of the theoretical and practical tradeoffs of various NoSQL stores (Cassandra, Elasticsearch, DynamoDB, etc.) with respect to different read/write patterns and availability/consistency requirementsMastery of operating and designing stream-based data systems (Kafka, AWS Kinesis, GCP PusSub, etc.) particularly under varying loadBe proficient in modern big data architectural approaches (Kappa/Lambda architectures, Data Lake Zones, etc.)Experience with data pipeline and workflow management tools (AWS Data Pipeline, Apache Airflow, Argo, etc.)Experience with stream-processing systems (ksqlDB, Spark Streaming, Apache Beam/Flink, etc.)Experience with software engineering standard methodologies (unit testing, code reviews, design document, continuous delivery)Develop and deploy production-grade services, SDK's, and data infrastructure emphasizing performance, scalability, and self-service.Ability to conceptualize and articulate ideas clearly and conciselyEntrepreneurial or intrapreneurial experience where you helped lead the creation of a new product & organizationNice to Have'sStrong algorithms, data structures, and coding background with either Java, Python or Scala programming experienceExperience working with Snowflake data warehouses and dimensional modeling practicesBA/BS or Masters in Computer Science, Math, Physics, or other technical fieldsExperience with at least 10+ terabyte datasets, ideally up to multiple petabytesWhat We OfferCompetitive base salary and bonusA comprehensive, benefits package that includes medical, dental, vision and life insurance plans, paid time off, a generous 401k match with no vesting period, parental leave and 3 volunteering days each year. For more information on benefits, please access the benefits page on our careers site: work locations in the state of Colorado, the anticipated minimum base salary for this role would be $140,000 - $200,000. Compensation will be determined by the education, experience, knowledge, and abilities of the applicant.We're building a software solution that connects data in revolutionary ways, illuminating answers that were previously impossible to find and empowering our clients to envision the future so they can determine the best course of action in the present. Join us!

