SRE Training Courses & Topics
Site Reliability Foundation
Introduction to the principles & practices that enable an organization to reliably and economically scale critical services.
- SRE Principles and Practices
- Service Level Objectives and Error Budgets
- Reducing Toil
- Monitoring and Service Level Indicators
- SRE Tools and Automation
- Anti-Fragility and Learning from Failure
- Organizational Impact of SRE
- SRE, Other Frameworks, The Future
- 04 Days
- Virtual
Grafana Labs
In a world overflowing with information, mastering Grafana is a game-changer. Unlocking the power to turn complex data into stunning, actionable insights effortlessly.
- Overview and user interface
- Prometheus and Server Monitoring Dashboards
- Canvas Panel
- Telegraf and InfluxDB Datasource
- Creating Grafna Dashboards
- Cloud Monitoring
- Monitoring Websites and Docker Services
- Installing Plugins
- 01 Days
- Onsite
Site Reliability Practitioner
Automation and observability of SRE principles.
- Practical view of how to successfully implement a flourishing SRE culture in your organization
- The underlying principles of SRE and an understanding of what it is not in terms of antipatterns
- Organizational impact of introducing SRE. SLIs and SLOs in a distributed ecosystem and extending the usage of Error Budgets
- Building security and resilience by design in a distributed, zero-trust environment
- Implementing full-stack observability, distributed tracing and Observability-driven development culture
- Curating data using AI to move from reactive to proactive and predictive incident management
- Using DataOps to build clean data lineage
- Why Platform Engineering is important in building consistency and predictability
- Implementing practical Chaos Engineering
- Major incident response responsibilities
- SRE Execution model
- 02 Days
- Virtual
DynaTrace Beginner
A step by Step training covering to understand the use of DynaTrace for application and server monitoring.
- Focus on availability and performance monitoring to improve digital experiences.
- Explore Dynatrace Platform and its components and delves into monitoring infrastructure, applications, and microservices.
- Review Dynatrace’s robust analytic capabilities to gain deeper insights into your applications, infrastructure, and user experience Identify performance bottlenecks and proactively drive better business outcomes.
- The resources Dynatrace has available to help you analyze your environment.
- Review Dynatrace tools to increase automation and reduce manual efforts with application monitoring and performance management.
- 03 Days
- Virtual
Splunk
Covers commands, functions, and knowledge objects to provide users with actionable information about searching best practices and knowledge management.
- Working with Time
- Statistical Processing
- Comparing Values
- Result Modification
- Correlation Analysis
- Intro to Knowledge Objects
- Creating Knowledge Objects
- Creating Field Extractions
- Data Models
- 04 Days
- Virtual
Kubernetes Application Developer (CKAD)
To build design, and deploy cloud-native applications for Kubernetes.
- Core Concepts
- Configuration
- Multi-Container Pods
- Observability
- POD Design
- Services and Networking
- State Persistence
- Security
- Helm Fundamentals
- 05 Days
- Virtual
Cloud, CI/CD and DevOps Fundamentals
An overview of CI/CD and DevOps.
- Old School Integration Pain points based on old school SLDC
- Solving the pain points through CI/CD
- Overview of Agile
- Understanding of code repositories (GIT) and code branching
- Understanding of Pipelines
- Maturing of Continuous Deployment
- Overview of Cloud
- 01 Days
- Onsite
System Design
How to approach a design problem (e.g. Design YouTube, Meta, Twitter etc).
- Load Balancing, autoscaling, replication, and failovers
- Resiliency patters
- Stateful and stateless clustering
- Chaos Engineering
- N-tier architecture
- Microservices architecture
- Object Oriented design
- Design Patterns
- 01 Days
- Virtual
Production Management
To help understand overall Incident Management process.
- Experience of being On-Call and learnings from incidents/outages
- Performing Root-Cause-Analysis (RCA)
- Blameless Post Mortem (BPM)
- Jira overview
- Incident management
- Real user monitoring
- Distributed systems monitoring
- Troubleshooting
- Network latency and failures
- 01 Days
- Virtual
Datadog Monitoring
A full guide for Datadog monitoring.
- Introduction
- Setup
- Datadog Agent
- Monitoring - Host
- Tags in Datadog
- Monitoring - Processes
- Monitoring - Containers
- Monitoring - UI
- Metrices in Datadog
- Events, Dashboards, Logs
- Alerts
- Application Performance Monitoring (APM)
- Continuous Profiling
- 02 Days
- Virtual
Kafka for Beginners
Understand Apache Kafka Ecosystem, Architecture, Core Concepts and Operations.
- Fundamentals
- CLI
- Kafka UI
- Kafka Java Programming
- Kafka Producer and configurations
- Consumer and configurations
- Extended APIs
- 01 Days
- Virtual
JMS
Learn how messaging works in JMS.
- Messaging Basics
- Anatomy of a JMS Message
- P2P Messaging
- Pub-Sub Messaging
- Message filtering
- Guaranteed Messaging
- Security
- Message Grouping
- Java EE and Message Driven Beans
- Spring JMS Overview
- 01 Days
- Virtual
Linux Fundamentals
Basics of Linux and progress to mastering more advanced concepts, including command line, shell scripting, system administration, and network configuration.
- Overview
- Linux Directory and File Structure
- Basic Shell Commands
- Working with Files and Directories
- Setting permission, viewing, editing files
- Introduction to editors
- Shell prompts, environment variables
- Process and Job Control (Cron)
- 01 Days
- Virtual