Facebook is down

Well most was already said on this topic, but I think it’s an interesting one.

What happened?

First: It was no hack of any sort, just a maintenance task that went wrong.

A small part of the backbone was shutdown for maintenance like it happens almost everyday. During this process a faulty command was sent that create a cascade of event shutting down the whole backbone. The DNS servers were at this point unable to communicate with Facebook network, and to avoid send outdated or faulty data they turns off their BGP advertisement making them virtually disconnected from internet.

This created 2 outages:

Facebook network was unreachable
No more records for Facebook DNS were sent to the internet

The first one impacted facebook and all its services (including whatsapp, snapchat, etc… and login with facebook)

The second one is more interesting: all the traffic usually addressed to facebook was looking for the DNS records, DNS works on a near to near

When you want to access to facebook.com, you computer will first look if it know the address (meaning to I have a really recent address to facebook.com) if it don’t, it will ask your default DNS server (usually your router) that then will ask its DNS server so on until it reaches the server in charge of facebook.com and the information will propagates.

This means that usually very few requests goes back to facebook.com server, the data are most likely cached closer to your computer.

During the outage facebook DNS server didn’t existed anymore, so all the request went up to the TLD and back to .com (or .fr or whatever) increasing the pressure of all internet DNS.

If you are interested in more details, you can consult the public post-mortem.

BGP?

This one is a complex topic, I won’t go too much in the details, because I just don’t have the knowledge to give anything but a overview.

Basically when you are in a LAN, to reach any other computer in the network your computer makes a ARP request basically asking who has such IP address. But it works only inside a specified network.

When you want to access internet, you need to go outside of your network. For this your computer know the gateway that can access your destination network (your router). I’ll skip on the NAT as it’s not relevant here.

At some point after your router you’ll reach internet. That’s what matters to us, in internet you are no longer in a single network, and to know where to go to reach your destination, every router will broadcast BPG packets every minutes. And going peer to peer each router create a map on who is the next step to reach its destination.

It’s a bit like on the highway: imagine the highway is the router, each exit lead to either another highway or a city. You follow the sign, take the exits that get closer to your destination until you reach it.

In this analogy, BGP what maintain the signs. With no BGP the signs disappear and each city has to maintain the signs that points to it.

Why does it matters?

Well it’s super interesting because this event show us how resilient yet fragile is the internet.

It showed that a small mistake can shutdown a big part of the internet (the world biggest social media conglomerate). How removing it impacted the rest of the internet (with the growth of DNS requests, the outage of unrelated services (all web sites/app that used facebook SSO), and how many of us are depending on a single corporation without knowing.

I know that many of the service I use are depending on cloudflare and aws a similar issue to any of them will impact me greatly.

This makes really important to me the research of digital independence: what do i mean by this. Of course we will depend on the GAFAM, to a certain extend, and on our ISP. I think that few technical step can decouple this dependency. I don’t have a comprehensive guide, that a problem that needs constant work due to how fast technology are evolving. But my next steps will be:

Adding more diversity to my DNS: I use cloudflare, I’ll add a DNS on AWS, and google as well (1 down 2 more to go)
Multiple replication: I host some services, this website is on AWS, a good step forward will be to be able to backup the data to multiple provider and ensure that I can redeploy an exact copy in a second/third provide (including myself) in really short time, hence the need of working with architecture as code
As for the SSO provider, I have a mixed feeling, It’s a good way to ensure security of your credentials while keeping if simple (Most SSO provider will always provide a better security that you can, accept it) but it’s also a risk because a security issue from the one you use will compromise many services you use. It’s a balance to find, maybe still avoiding SSO for personal accounts will stay my way to go. I end up with many passwords and many 2FA token where I can’t use my yubikeys, but still feels more secured, if one fall, it won’t impact any other.

What happened?

BGP?

Why does it matters?

0 Replies to “Facebook is down”

Leave a Reply Cancel reply