Here I present the 4th blog in the DevOps series showcasing our learning while #scalingup. Read our previous blog to know about the lessons we learnt from our scrum.
Most DevOps patterns are exhaustive and would need some strong commitments, in terms of both time and effort which is why many a times people hesitate to introduce DevOps to their teams. Because there usually is a lot of change required in your way of working and the way your workflow flows when you try to retrofit a DevOps into your setup. More so if you did so without thinking hard about the consequences. So how did we create a continuous system of delivery? How did we make a system that is robust yet flexible enough to incorporate the needed changes?
DevOps has a unique challenge. It is difficult to retrofit DevOps into a running system and doing that is not going to directly bring revenue. So in a running startup setup we simply cannot imagine stopping everything, spend time exclusively on this and then go back to our other tasks. But it does not mean that we can’t’ have a functional DevOps, it only means we have to go from MVP (minimum viable product) to MVP on this, and the earlier MVPs might seem pretty embarrassing but it is useful to prove the concept first and then go on and build the concept. Else you might end up with a sophisticated solution that is working but may or may not be what you need.
I have already talked about our thought process where we are against blindly reinventing the wheel or retrofitting someone else’s wheel onto our vehicle. The same goes for DevOps, barring obvious things, we never did any upfront development for DevOps or DevOps processes, rather we went step by step automating bits that we could and bureaucratized bits that we couldn’t. And then went back and discarded the failed MVP, and formalized the ones that have passed. And as always kept doing it. Because above all DevOps is a result of the Agile attitude more than anything else. If you are not thinking agile when you build your DevOps, then you will end up with a random bunch of automated tools, and not a living, breathing system of delivery.
So, here are our bunch of tactics that we used at different times during our evolution to achieve a successful clockwork.
The most bureaucratic of them all. Forms are the ultimate icon of a bureaucracy. Whenever you want any service done from any bureaucracy, you have to fill a form. And it definitely looks uncool. However there is a reason why it is still going strong across the world both offline and online. The primary benefit of a form is that you get all the data you need in the correct format and it is easy to find out if any required data is missing. This helps because data being missing can be addressed at the very beginning and not later on. Where we use forms is when one group of people need a service done by another group whose work is totally different from the first. For example, when developers want the deployment team to deploy something somewhere (any of our test envs or staging or production) now a developer could do it. But the deployment team owns the passwords and access to the staging and production setups which devs do not have and they also own the best practices which the dev team may not know. One of the problems they faced was that each developer had a different idea of how to push their artifacts to production. For devs of one domain, just copying code was enough, others needed binaries to be changed, still others wanted binaries and configurations to be changed. Some needed process restarts and others did not. Some needed server restarts while others did not. For some devs running an update script was enough while for others some other thing also had to be done. It was very difficult to keep track of these. One solution would be to have the deployment automated. And that is where we eventually reached. But that was not a one day job, and till that was done, we could not have stopped deployments.
So we needed to make sure that the deployment team had the right info when the request was made and not then the team had done half the deployment. We experimented with emails, but most devs wrote what they thought was relevant and not what really was relevant. So we asked developers to mention all that was needed. But that also was not too clear a requirement. And many still missed the requirements. So we created a form. The biggest advantage, is that the developer cannot now accidentally miss any important info. It is difficult to miss info when the form explicitly asks for info. So now developers fill all the info even the ones they think are obvious or irrelevant. And the deployment team was now empowered to act like a bureaucracy and reject a form for data not filled. This worked like magic, our deployments became less buggy and there was far less cases where dev and deployment teams were arguing about what they meant when they said something. The direct result of adding deployment forms was that our deployments were now far less broken.
It is the checklists that eventually gave us the idea that forms can also work. Checklists are the zero brainer tool of choice for a lot of people. And it has been so for almost a century now. Perhaps more. However the power of checklists are often underestimated or ignored. You need to read The Checklist Manifesto – by Atul Gawande, to understand how important and valuable checklists are and how crazy it is to ignore them. There was a huge list of things that we were many times not doing right or missing small or obvious steps. And no matter how much we tried we would many times repeat the same mistakes over and over again. And usually our bandwidth was too tight to go deep into each of those issues and find out what went wrong. The responsible person would say that they have done each step as it is supposed to be done. Obviously they did not think that they had done wrong. Because if they thought they were doing the steps, wrong they would not have done it the first time around. No one likes to do the same things repeatedly. Of course many such processes eventually got automated, but till they did, there needed to be a solution. Also not everything can be automated. So we created checklists for every repeated process. And placed them in prominent and easy to find places (physically and in our intranet) This also worked like magic. Soon the number of errors in repeated processes saw a huge fall. Once the errors reduced, the people handling those tasks got more time and bandwidth and they had time to fix their processes and also to automate them. This way the manual process need to be no longer done by a human. This brings us to the next step.
While it was clear to us long time back that there needs to be a lot of automations in our day to day work, there was one thing that did not allow us that immediately. And that was that automations take time and usually features for customers had highest priority and lowest available times. So when do we spend time on automations? For some time we just postponed the automation work thinking that after the present set of issues are finished, we would have time to work on automations. However we soon realized that this is a mirage and being the startup that were, getting bandwidth for this sort of thing will not be a reality any time soon. But does that mean we abandon it and the benefits that they would bring ? No, what we did was, we took time to make sure that whatever bits we could automate, we would. And make sure that we keep using it unless it was totally unusable.
We made sure we set out to automate only small bits. I call the process Micro Automations. As the name suggests, we only automate a small bit at a time. We also ensured that each bit that we automate is SMART, (specific, measurable, achievable, realistic and time bound). And also we made each bit small enough that it can be done part time by one or two people knowing very well that the main features clearly have priority.
This ensured that over time, we automate more and more bits of processes like our build systems, deployment process, automation tests etc. Bit by Bit. Over time, however we achieved a lot. Finally after all these months, we look back and marvel how much we have actually achieved. As the old adage went, we won the race, by being slow and steady.
There are many more such things that I would like to talk about, but I would like to limit myself to the bureaucratic ones that we attempted. We wanted to make sure that our focus was on the problems pCloudy was supposed to be solving for our customers, while at the same time, we ensured that we were also becoming future ready and being more and more ready to scale further as an organization. Your comments and insights are welcome. I would like to have a conversation about that in the comments section.