Google, the name that holds the title of the world’s largest search engine, became the headline across tech domains and news in general off recently. This happened as some of its services stopped working due to a massive outage on the 14th of December 2020. The outage affected many users that were trying to log in to their Gmail, Youtube, Docs, etc. It also affected the devices that were connected to the Google Home module that uses the tech behind google voice assistant. Joe Brown, a popular journo in the industry, took to Twitter to rethink smart homes.
Some would say that since the year 2020 hasn’t really panned out the way it should, the outage of this scale doesn’t really surprise them either. While many services of Google such as calendar, youtube, did get affected, the search engine operated largely unaffected.
What is more surprising is that some sources say that despite such massive outages starting at 11:50 am GMT. The automated systems at Google reported no problem for any services for the first 30mins. However, later at 12:25 pm, Google published an update saying that “they’re aware of the problem… that is affecting a majority of users that are unable to access the Google Services.” Outages like these make you wonder if we are ready to trust our homes and office facilities to the automated systems that are in place. And you start to think if there is fool-proof planning in case they fail.
A Brief about the Outage
The underlying problem that caused the outage pointed to the failure of the company’s authentication tools, a Google Spokesperson said. The authentication tool handles the logging in of the users to Google’s services. Since the authentication tool failed to log in the users to the necessary platforms, the services were inaccessible.
At around 5:17 pm IST on the 14th of Dec 2020, Google experienced an authentication system outage for nearly 45 minutes. The spokespersons at Google claimed that the issue escalated because of an internal storage quota. The services that required logins faced several errors over the course of the 45mins. The issue was finally resolved at approximately 6:02 pm IST. The spokespersons at Google conveyed their apologies for the inconvenience and ensured its users that they will conduct a thorough follow-up review to ensure that this does not happen again in the future.
The Root Cause of the Google Outage
Delving deep into the root cause of the outage. The company’s services started to fail due to a lack of storage space allocation for the tools that handle the authentication. The mechanism of their systems was designed in such a way that once the storage is filled up, it will automatically make more space available. However, this wasn’t the case and hence the system crashed causing the outage. The spokespersons explained that this is much similar to the system crashes that we experience at our home on our desktop system. When the storage on the computer gets full, there is no room for the processing of the system, and hence it goes through a system failure.
The failure that was caused due to the outage not only affected the services such as GSuite, Youtube, Gmail, Calendar. But also, the intra-office tools like, docs, sheets, slides, chat, meet, etc. I remember that we had to move our weekly call from google meet to another video conferencing platform for smooth functioning of the meeting. Twitteratis flooded the platform announcing the outage across all the affected services. #YoutubeDOWN became an instant trending hashtag because of the whole scenario.
Smart Devices went down as well
Devices like Google Home Smart Speakers, Nest thermostats, Nest Security and Monitoring Systems that are hosted on Google’s servers were also affected. Users were not able to access devices or use the services via the app due to the login outage. Similar outages during the year from Amazon.com Inc and others have taken down doorbells, automatic cat-feeders, and Roomba vacuums as well.
Outages like these start to make you think if we are ready to be fully reliant on cloud services or if we need to rethink our plans. Backing-up services locally as a precautionary measure should be a suggestive step to consider in the case of failures of such scale. Ensuring that all services go through thorough testing and adopt better ways during a failure is something that tech giants need to start to explore. Instances like these also remind us of the heavy reliance of users on Google’s services. Sometimes this heavy reliance on a single platform puts in the danger of monopoly as well.
Lessons to Learn
It is important that we distribute the services across different cloud service providers to ensure that there always is a plan B in case of failures. Sometimes, being reliant on in-house tools could also hamper the quality of testing. Taking the help of an external consultant to assess the quality and testing of cloud-based services and sourcing some of the services to other providers will enable tech giants to minimize the damage during outages. Distributed risk pans out as an insurance policy in such a way that if one system crashes the whole system doesn’t sink with it. It instead hangs on to the other distributed services. This is how pCloudy, a continuous testing platform, can help you assure your apps.
The Google Outage sure have taught both; the tech giants and the tech users that we need to be careful and prepared for the better. Ensuring that we always have a plan in case of a failure caused by the systems or even by hackers. While technology has been a boon for many in the advancing 21st century, these outages remind us otherwise.