Observations on Observability - #O11ycon report
August 2 | 9:28 AM PST
August 2 | 9:32 AM PST
August 2 | 9:35 AM PST
August 2 | 9:39 AM PST
Today, we are going to spend some time defining observability.
August 2 | 9:39 AM PST
August 2 | 9:46 AM PST
The engineering view of who needs observability:
Engineering
Customs support teams
Customers
August 2 | 9:53 AM PST
Structured logs are key to observability. You can track times, servers, customers - all the things you need to use to slice and dice one incident in a billion.
August 2 | 9:55 AM PST
August 2 | 10:01 AM PST
August 2 | 10:21 AM PST
If you have a “fleet” of servers performing tasks, human oversight is insufficient for problem-solving. It can take hours to figure things out through log searches.
Distributed systems need the right kind of instrumentation to even examine the problems. And there are multiple tools to use in different places.
August 2 | 10:22 AM PST
August 2 | 10:35 AM PST
August 2 | 10:43 AM PST
August 2 | 10:52 AM PST
August 2 | 10:53 AM PST
August 2 | 11:26 AM PST
August 2 | 11:30 AM PST
August 2 | 12:40 PM PST
August 2 | 12:42 PM PST
August 2 | 2:37 PM PST
August 2 | 3:40 PM PST
August 2 | 3:46 PM PST
August 2 | 3:57 PM PST
At Google, the people who write the software, run the software. This is very different from typical Enterprise approaches, where things are thrown “over the wall”.
A healthy environment empowers developers, devops, and sysadmins rather than facilitating blame.
August 2 | 4:01 PM PST
What does “observability” mean? Charity: it’s about being able to see what’s happening in a system by looking at it from outside, without shipping new code.
August 2 | 4:25 PM PST
August 2 | 4:32 PM PST
August 2 | 5:14 PM PST

Helping communities find their voices.
