This job board retrieves part of its jobs from: US Jobs | Colorado Jobs | Retail Jobs Canada

Finds jobs across the province today!

To post a job, login or create an account |  Post a Job

  Jobs across the province  

Bringing the best, highest paying job offers near you

previous arrow
next arrow

Service Reliability Director


This is a Contract position in Toronto, ON posted September 1, 2020.




Requisition ID: 89099


Join the Global Community of Scotiabankers to help customers become better off.



As a senior member of the Canadian Banking & Global Wealth Management Technology team, the System Reliability Officer (SRO) will lead and grow a team that will work with Senior management, peers, and business partners to continuously improve the stability, reliability and efficiency of our Global Wealth Technology systems through Site Reliability Engineering (SRE) based principles and practices that will include continuous people, process and technology ( ”automating all the things”) enhancements in support of our rapidly changing technology product portfolio. You will work cross-functionally amongst a variety of teams and be a core contributor in all significant engineering service or solution delivered to Canadian Banking stakeholders. You will also have an understanding ‘what could go wrong’, solve complex problems and have a flare for communicating and leading discussions with technical and business partners. You will work directly with our Software Engineering teams to both maintain and operate our existing technology and build our next generation of technologies.


Key Accountabilities:


  • Develop, grow and lead a team of Reliability Engineers (directly and through local and global communities of practice) , working closely with software development, Quality, Product and Data Engineering teams as a Champion of SRE/ DevOps culture and practices
  • • Lead management of Service Level Objectives with senior development and business leads
  • • Lead initiatives to continuously refine our build, plan and deploy practices for improved stability, reliability, efficiency, repeatability and security. You’ll create plans, collaborate with other SROs and DevOps team members – coordinating activity with development and business leads to increase service levels, lower costs, and support delivery velocity objectives
  • • Working closely with Development and operations teams to lead troubleshooting of our most severe incidents leading senior stakeholder communication , driving problem-solving (e.g., log analysis, non-invasive tests) and debugging with best practice techniques
  • • Leading continuous improvement and execution of quality and timely major incident root cause analysis and blameless post mortem activities to ensure we take action to avoid similar problems in the future
  • • Lead prioritization of reliability features and contribute to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks.
  • • Lead In-depth data analysis to gauge service trends and drive improvements.
  • • Lead proactive communication of reliability, stability and efficiency results (based on Service Level Objectives) , service health (via dashboards) key reliability risks and issues to senior business and technology stakeholders to prioritize activity (based on trend analysis ) and direct investment and action
  • • Enable and design/developing reliability solutions this may include writing code and scripts to automate provisioning of services and to configure services
  • • Contributing to the automated delivery of large-scale applications and continuous integration and delivery pipelines across multiple platforms
  • • Assisting in improving infrastructure automation, efficiency, and cost
  • • Actively pursue effective and efficient operations of his/her respective areas, while ensuring the adequacy, adherence to and effectiveness of day-to-day business controls to meet obligations with respect to operational risk, regulatory compliance risk, AML/ATF risk and conduct risk, including but not limited to responsibilities under the Operational Risk Management Framework, Regulatory Compliance Risk Management Framework, AML/ATF Global Handbook and the Guidelines for Business Conduct.
  • • Champion a high performance environment and implements a people strategy that attracts, retains, develops and motivates their team by fostering an inclusive work environment; communicating vison/values/business strategy and managing succession and development planning for the team.


Education and Experience


  • Top notch engineer with leadership, systems administration.
  • • Performance and results oriented leadership skills with a developmental bias (coaching)
  • • Excellent communication (both verbal and written). The ability to communicate confidently and clearly on conference calls, in meetings, via email, etc. at all levels of the organization is essential
  • • Ability to quickly and clearly communicate incident status via email in business friendly language
  • • Experience with ITSM tools (ServiceNow, a plus) with strong understanding of SRE and service management principles
  • • Strong organizational skills and the ability to effectively manage multiple tasks simultaneously
  • • Capability of working in a complex and fast paced environment
  • • Ability to represent the team in meetings and presentations that include Senior Business Technology executives
  • • Ability to maintain calm during stressful situations
  • • Degree in Computer Science, Engineering, or equivalent experience. ITIL V3 Foundation Cert. in ITSM would be an asset
  • • 10+ years’ experience in IT with at least 3 years’ in management (5+ preferred)
  • • 2-3 years professional coding experience in one or more of the following: C, C++, Java.
  • • Mastery of one or more scripting languages for automating systems , eg. Bash, Python, Ansible.
  • • Well-rounded broad knowledge of OS platforms (Linux/UNIX), Networking, Web Systems and IT Ops
  • • Experience working with large-scale distributed systems
  • • Advanced understanding of SOA or microservices architecture
  • • Experience using Jenkins, Bamboo or other CI tools
  • • Advanced experience with GCP/AWS services
  • • Understanding of serverless architecture (Lamda) and IaaS
  • • Understanding of data structures, algorithms, best practices
  • • Deep understanding of containerization using Docker or similar
  • • Advanced Scripting experience with Python, bash or shell
  • • Experience working in an Agile environment
  • • Spanish language skills is an asset



Location(s):  Canada : Ontario : Toronto 

As Canada’s International Bank, we are a diverse and global team. We speak more than 100 languages with backgrounds from more than 120 countries. Our employees are committed to a superior customer experience and use the Bank’s six guiding sales practice principles to ensure they act with honesty and integrity.


At Scotiabank, we value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. If you require accommodation (including, but not limited to, an accessible interview site, alternate format documents, ASL Interpreter, or Assistive Technology) during the recruitment and selection process, please let our Recruitment team know. If you require technical assistance, please click here. Candidates must apply directly online to be considered for this role. We thank all applicants for their interest in a career at Scotiabank; however, only those candidates who are selected for an interview will be contacted.