What's this Downhound thing?

The short answer: accurate, comprehensive, unbiased Internet outage reports.

The long one: First some background. Most businesses and organizations these days run their applications and websites on a cloud computing platform. The most popular of these is Amazon Web Services.

If you're a cloud engineer, your job is to keep your apps and websites running flawlessly, all the time, for all your users. But doing that is not an easy job because there are a lot of components that go into any app or website. You have the code that your own company has written, as well as the underlying cloud services that it depends on to work. This cloud services can include things like storage, to show images to your users. Or payment systems to process credit card purchases.

When something goes wrong, the first question a cloud engineer asks themselves is: is it my code or config -- or is a service that my code depends on experiencing an outage?

Downhound's mission is to help you answer that question: is it you, or a cloud service?

To do that, Downhound runs thousands of software bots that continually track the performance and availability of over 200 Amazon Web Services products that are running in 25 regions worldwide, every minute of every day, 24/7. Our bot army checks nearly 5000 AWS endpoints (URLs) regularly, noting whether they're responsive. If an endpoint is unresponsive, slow, yields an error, or has an invalid certificate, we track that information.

This gives you fine-grained details of what's going wrong with any AWS service. Every event, timestamped to the second. It gives you the detail you need to begin to find the root cause of any app or website failures.


Now you might be thinking: what about the AWS Service Health Dashboard?

The problem is bias.

The AWS Health Dashboard covers up a lot of failures. Go ahead, check it now. It probably shows that many services are running normally. And yet, our experience is that AWS has failures regularly when across all services and regions.

This is not an indictment of AWS or their engineering teams. Many of these failures are minor hiccups, which your code should be written to handle. And running nearly 200 services in 25 regions worldwide -- about 5000 endpoints (URLs) -- is not an easy task.

Even so, you need detailed failure information from a neutral third-party -- like Downhound. Not a high-level green checkbox from a company whose financial incentives are to not show every single failure.


You might also ask: what about DownDetector?

The problem is accuracy and depth.

DownDetector relies on users to report outages, and even then only provides high level trends. Their data is useless for trying to debug your application or website code or configuration. Anyone on the Internet -- even the technically incompetent -- can press their "I have a problem" button. Whether or not there is an actual issue. So you cannot trust DownDetector to provide accurate guidance on whether a problem lies with you, or your cloud provider.


To recap:

Downhound is accurate: it relies on software bots, not some random web user.

Downhound is comprehensive: it tracks all available AWS services in all regions, 24/7.

Downhound is unbiased: our incentive is to provide accurate information, not cover up outages.


As a new service, Downhound is far from perfect. And AWS is just the beginning. Please let us know how we can make it better to help you do your job as a cloud engineer. Thank you.


Made with ❤️ in San Francisco

© Downhound