Principal Site Reliability Engineering Lead

Oakland, CA, US, 94612

Requisition ID # 165427 

Job Category: Information Technology 

Job Level: Manager/Principal

Business Unit: Information Technology

Work Type: Hybrid

Job Location: Oakland

 

 

Department Overview

The Data Solutions Architecture Team at Pacific Gas & Electric Company is responsible for driving long-term, enterprise-wide data solutions, target state architecture, and overall excellence with the application of data, analytics and information to critical business challenges and opportunities. This team is chartered to develop the strategy, roadmap, and accompanying standards that will enable better use of data and information and to develop analytics maturity at PG&E.

 

Position Summary

The Digital Utility runs on data and information. At PG&E, we have many teams building data products that need support, and our operations teams are on the hook for ensuring reliability and support across all data products. The Principal Site Reliability Engineering Lead fills a critical role in empowering our operations teams to do their best work.

 

The Principal Site Reliability Engineering Lead will drive our operations strategy in DA&I, working with operations teams with implementing best practices, mentoring junior engineers, driving automation, and building a continuously improving operations practice. You will work with operations management and operations engineers to create scalable DevOps practices for key data platforms at DA&I, notably Palantir Foundry, Snowflake, and Informatica. You will also get hands-on with operational problems, and building out operations tooling for the team.

 

We strive for a team that will make a difference in the new PG&E. As Site Reliability Engineering Lead, you will have a direct impact on the day-to-day life of data solutions, delivery, and affect the Safety of California. You will be collaborating with other technical leaders and Executive Leadership to help reshape a first-class operations team, with high levels of reliability for the data products we, and our customers, rely on the most. As Site Reliability Engineering Lead, you will work closely with supportive Operations management, a talented team in need of your guidance, and an organization looking to you to support their key products.

 

The Principal Site Reliability Engineering Lead will report to the Senior Manager of Data Solutions Architecture in the Data Analytics & Insights department of Information Technology, and work closely with the Data Ecosystem Operations team.


PG&E is providing the salary range that the company in good faith believes it might pay for this position at the time of the job posting. This compensation range is specific to the locality of the job. The actual salary paid to an individual will be based on multiple factors, including, but not limited to, specific skills, education, licenses or certifications, experience, market value, geographic location, and internal equity. We would not anticipate that the individual hired into this role would land at or near the top half of the range described below, but the decision will be dependent on the facts and circumstances of each case.


A reasonable salary range is:

Bay Area Minimum: $155,000.00

Bay Area Maximum: $265,000.00

 

Job Responsibilities

  • Technical Support and Collaboration: Provide applications engineering support to product teams. Collaborate with product teams, support teams, and customers on shared goals, cross-team projects, and new initiatives.
  • Continuous Improvement and Reliability Practices: Strive for continuous improvement in processes and reliability practices. Develop and evolve improved operations workflows.
  • Leadership and Mentoring: Show teams how to improve quality and eliminate waste by implementing improvements with them.
  • Hands-on Troubleshooting: As a member of the Operations team, you will join them on-call and be available to help with escalated issues, or issues requiring your additional experience and steady hand.
  • Operations tooling: You will build tools for improved operational workflows in collaboration with, and leading, members of the Operations team.
  • Efficiency: Identify wasteful processes and procedures. Work with teams to streamline and automate tasks.
  • Performance Monitoring and Improvement: Monitor, measure, and enhance the performance and state-awareness of systems. Identify and drive improvements in infrastructure and system reliability, performance, and monitoring.
  • Root Cause Analysis and Investigation: Lead investigations into repetitive damage and failure rates, utilizing root cause analysis techniques. Implement corrective and preventive actions based on findings.
  • Reliability and Capital Planning: Participate in annual and long-term reliability planning, ensuring alignment with operational objectives. Contribute to the development and execution of life cycle asset management processes.
  • Architecture: Own the Information Architecture and related Technical Architecture for the Operations sub-domain of the Data & Information Architecture domain.
  • Technology Life Cycle: Develop and execute strategies to introduce new capabilities needed, evolve and mature existing capabilities, and retire capabilities at their end of life.
  • Documentation and Governance: Develop and maintain architectural guidance documents and artifacts, practices and procedures, and governance to support the above.
  • Strategic Planning: Support technology strategy, planning, and road mapping activities across IT and at the enterprise level.
  • Data Analysis and Predictive Modeling: Perform statistical data analysis. Utilize data insights for capacity planning, demand forecasting, and identifying performance bottlenecks.

 

Qualifications

Minimum:

  • Bachelors Degree in Computer Science or job-related discipline or equivalent experience
  • 7 years of relevant work experience in Information Technology, Data Management, Business Intelligence, and Analytics, to include experience in both IT and line of business departments


Desired:

  • Experience working directly with line of business stakeholders demonstrating job-related skills.
  • 5 or more years experience with Site Reliability Engineering/DevOps practices.
  • Experience with analytics and data management principles such as: data acquisition and modeling, data warehousing, business intelligence, metadata management, master data management, advanced analytics and data science, “big data” techniques, public/hybrid/private cloud data management and analytics services data security, and data and analytics governance.
  • Ability to achieve a deep understanding of line of business strategies, priorities, needs, and current capabilities.
  • Ability to work collaboratively to engage and influence business and IT stakeholders, senior leadership and external partners.
  • Customer management and negotiation skills that enable the ability to mediate opposing viewpoints and articulate the advantages of a preferred solution.
  • Excellent written and oral communication skills across all levels; ability to communicate complex technical concepts to leaders, business sponsors and stakeholders in clear, concise language that inspires confidence and earns trust.
  • Strong leadership skills in the technology and operations domain and a high level of drive, initiative and assertiveness.
  • Extensive experience with SRE/DevOps practices and tooling
  • At least 3 years experience developing operations automation tools in Python or another high level scripting language commonly used on Unix systems.
  • Familiarity with at least two or more of: Scaled Agile, Scrum development methodology, DevOps/DevSecOps, LEAN, Six Sigma or ITIL practices.
  • Experience with any of the following: Data Architecture, Airflow, Palantir Foundry, Informatica, Spark, Snowflake, Teradata, and other database and BI technologies, data access languages such as SQL, SAS, R, Python, Scala, etc.
  • Experience working in the Utility Industry and a working knowledge of Utility concepts and challenges a plus.

Purpose, Virtues and Stands

Our Purpose explains "why" we exist:

  • Delivering for our hometowns
  • Serving our planet
  • Leading with love

Our Virtues capture "who" we need to be:

  • Trustworthy
  • Empathetic
  • Curious
  • Tenacious
  • Nimble
  • Owners

Our Stands are "what" we will achieve together:

  • Everyone and everything is always safe
  • Catastrophic wildfires shall stop
  • It is enjoyable to work with and for PG&E
  • Clean and resilient energy for all
  • Our work shall create prosperity for all customers and investors

More About Our Company

EEO
Pacific Gas and Electric Company is an Affirmative Action and Equal Employment Opportunity employer that actively pursues and hires a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, color, national origin, ancestry, sex, age, religion, physical or mental disability status, medical condition, protected veteran status, marital status, pregnancy, sexual orientation, gender, gender identity, gender expression, genetic information or any other factor that is not related to the job.

Employee Privacy Notice                                                                                                                                                                                                                                      The California Consumer Privacy Act (CCPA) goes into effect on January 1, 2020. CCPA grants new and far-reaching privacy rights to all California residents. The law also entitles  job applicants, employees and non-employee workers to be notified of what personal information PG&E collects and for what purpose. The Employee Privacy Notice can be accessed through the following link: Employee Privacy Notice

PG&E will consider qualified applicants with arrest and conviction records for employment in a manner consistent with all state and local laws.


Nearest Major Market: San Francisco
Nearest Secondary Market: Oakland