We all know monitoring our APIs' health and performance is critical, but what metrics are most important to look at? And what happens when metrics aren't enough? This week on the API Intersection podcast, we interviewed David O'Neill, founder and CEO of APImetrics. The APImetrics platform for API and Cloud Service performance offers API monitoring and reporting for all, from the developer to the customer success team.
What API Metrics Should I Be Tracking?
We're all quick to resort to a number of API calls when it comes to reporting on your APIs. Still, there are a variety of other metrics at our disposal that paint a better picture of the program's overall health and performance. Instead, consider average and max latency, uptime, API usage growth, unique API consumers, and API Calls per business transaction, to name a few.
"The raw number of calls you make isn't necessarily useful. It's actually more so about what calls were they and what-to-what endpoints, and what were they actually doing that made them work?" shares David. "The question we asked customers is how do you know those calls work? Were all those calls valid passes?"
David stresses that the more data, the better when determining your APIs' success. Many use the gateway as the source of truth to do this, but that shouldn't be the end of the story. Instead, looking holistically at all of the API calls will tell you more than actually verifying that the API products themselves will work in the way they're expected to. Your APIs should always be collectively looked at as products and business services.
Looking Beyond the Metrics
But it's also essential to consider the data that DOESN'T make it all the way to the gateway.
"We're talking to several security companies about this because API security vendors tend to fall into the bucket of looking at all of the data, all the traffic, and then drawing conclusions based on the traffic. But, what about the traffic that doesn't get to the gateway?" shares David.
He emphasized that just because something happened in the network, many forgot to actually verify the things in security you expect to be already working–meaning, what if your gateway or security systems for the API aren't working as they should? Are you missing out on all those metrics and traffic? Is there nefarious activity going on that is happening outside of your security checks on your API?
The analogy he likens this to is that if someone had a Ring Camera on their front door, you could see everyone coming to your front door, but if their front door's left unlocked, you have no idea how many people actually went through it. It may be monitored, but it's not the whole picture.
"If it's unlocked, we could just walk in and go through your house, and that's the same with an API security platform…you can check that nobody's trying to do anything nefarious, but something could still be going on outside of that," shares David.
For example, in the case of the Optus hack in Australia, it wasn't really a hack; it was an API doing precisely what an API was meant to do; it just happened that it was the way of getting access to people's PII. The Optus team wasn't checking that you couldn't call that endpoint and get a 200 with a valid PII back because they hadn't thought of it. They weren't verifying that the doors and windows were locked in this situation because everything seemed to operate correctly under their usual security checks.
In this case, the API had a fundamental design mistake. However, it was still passing under their monitoring metrics and security checks, so it didn't matter how many metrics were saying the API was operational and effective.
"That's the sort of thing that we want people to be more aware of, which is trust, but verify. If you think you're doing things right, don't rely on passive monitoring of all of the data to alert you if something has gone wrong. Actively verify that you're doing the things you say you are," shares David.
How to Actively Verify Your API Program Works
Often, APIs tend to fail creatively– so they look like they're on and working, but the APIs are not actually doing anything or have a security flaw like the example we used above. APIs harness a remarkable ability to disguise the actual performance failure modes, so it takes excellent due diligence, regular monitoring, and truly understanding your API inside and out to avoid API failure.
Outside of monitoring metrics and platforms, you can avoid a lot of these API mistakes by getting the design right the first time (we like to call this the design-first approach). Take the foundational steps to include all relevant stakeholders in the design process, do regular design reviews, utilize style guides for consistency, and revisit the design of your API every once in a while to understand where room for improvement could be. The design-first approach combined with the "API as a Product" strategy we touched on above will help you avoid many errors that even a monitoring platform could otherwise overlook.
"Looking at this with API as a product ability will help show you what you're monitoring and different approaches to understanding that data and where the fail point is in a set of APIs." shares David, "I also love looking at the concept of the APIs as a supply chain, which is something I think people and companies are really just starting to dive into. Your API endpoint is just part of a vast network of things you may have little control over!"
I enjoyed learning more about the API Metrics team and David's thoughts on how we can all get better at monitoring and understanding the data produced by our API programs. For more insights from industry leaders, check out the API Intersection podcast.