Codespaces.com and DevOps: Why We Should Care

Codespaces.com and DevOps: Why We Should Care

Generally on this blog I tend to be far more technologist than pundit, in that I normally don’t write about something unless it has to do with doing something interesting technically.  However in this case I think a bit of talk about what happened and why might be in order for the architecture minded out there.

What Happened?

Let me first start by saying that I have no inside information here, I have never been a customer of codespaces.com, and have never in any capacity performed any work for them.  The information I have is all pulled from the Google Cache of their site here.  Now according to Codespaces.com on Tuesday June 17, 2014 they had a DDOS, and it appears that as a result of this DDOS attack that an attacker was able to gain access to their EC2 control panel.  The attackers ultimate goal was extortion, however when Codespaces attempted to regain control the attacker still had access to the control panel and was able to start deleting backups, storage and instances either strategically or indescriminately.  Now keep in mind, these are the current theories about what has happened, it could turn out that the attacker already had access prior to the DDOS attack, perhaps from a previous attack, and they were just planning their next move.

How Was This Possible?

Some will say this is because of the cloud.  I couldn’t disagree more.  Quite simply this is because of abstraction at all levels, when I was first writing this I felt the proper word was virtualization, however I don’t want people to read that and assume that virtualization vendors are to blame, they are only part of the problem.  The problem is three part (1) Centralized Storage (2) OS Virtualization (3) Simplification of Services.  Now of course these three problems have also been huge advancements in our field. Lets take a moment and unpack that.

Centralized Storage

Now we have had massive improvements in storage technologies that has resulted in massive cost savings by being able to scale a consolidated storage platform(s) as opposed to having to scale on all of the physical servers in the organization.  This doesn’t just pose a problem in the cloud though.  Think of the damage that an attacker could do if they had access to the administrative interface of your SAN/NAS devices.

OS Virtualization

As with storage we have been able to minimize waste at a compute level by leveraging virtualization, this allows us to more effectively utilize RAM and CPU and ensure that systems are utilized evenly over the entire organization.  This has resulted in an arguably bigger cost savings than centralized storage when you factor in the software license savings of some operating systems and third party software.  But again, the damage an attacker could do to an organization with this access to this interface is immense.

Simplification of Services

Finally the last reason for this is that services have gotten so much more simple.  Frankly it has removed the voodoo and black arts from the IT organization, because at then end of the day IT can provide a solution for $15K or a business user or developer can provision a service for $100/month.  The cost savings are fantastic, but the problem is removing your own experts from the equation leaves the business vulnerable to risks that perhaps the business user or developer doesn’t fully understand.

Now cloud provides blend these three things together which increases the value but also magnifies the risk.  This can be very bad, it can also be very good.  To ensure that we get more good than bad we need to architect around the weakness in the system.  This true for all things.  For example, a pistol has a trigger, this is used to make the gun fire.  Firing a pistol can be a very dangerous operation, especially if you remove the trigger guard (the trigger guard is the piece of metal (or other substance) which encloses the trigger to minimize accidental firing.  Now imagine that Police Officers didn’t have trigger guards on their pistols, I would imagine that we would have much higher rates of police officers shooting themselves in the foot, leg, etc when holstering their weapon.  This is something that we take for granted, however it is a very real risk.  To illustrate this take a gander at this pistol sans trigger guard and imagine carrying that in your waistband.

paterson-colt2

Lessons Learned for AWS and Similar Providers

1) Provide better segmentation, between operational services, and backup services, perhaps having a grace period on storage deletions (24 to 48 hours).

2) Implement some sort of extortion protocol that the customer can request (this of course could be exploited as well) which would result in full validation of all accounts and keys by AWS and the customer in tandem.

Lessons Learned for DevOps Shops

1) Know your infrastructure, know its weaknesses, hire experts even if only for short term “health-checks” to make sure that the Dev doesn’t overcome the Ops.

2) If you don’t already have a full time “Ops” guy then you need one.  This should not be a DevOps guy, or a Dev guy.  This guy is the one who will provide the voice of reason.  This guy should be architecture minded,

3) If you are currently 100% AWS or 100% Azure or any other cloud provider then you are 100% vulnerable to the exact same type of exploitation.  If you want to maintain a 100% cloud model you must look at extending beyond a single provider.

4) The final component to any DR plan (or even simply backup plan) should be a contingency in the event that the people or group of people missed something or worse, were incompetent.  Simulate a failure, a failure you haven’t even contemplated, perhaps even set aside some time with folks just to brainstorm how to simulate the failure.

Lessons Learned for IT Professionals

1) Traditional IT invented the DevOps model, this is our fault.  Our lack of flexibility and responsiveness, has left us open to have solutions built by someone who said that they could.

2) Be more flexible, quit driving the business into partially thought out models.

3) Advise the business of the risks associated with any sort of solution (cloud or not), and the important part is mitigations to those risks.  The fact is that for a lot of reasons cloud services are very attractive, find a way to incorporate them into the architecture without betting the business on them.

4) A repeat from DevOps because it matters here too…  The final component to any DR plan (or even simply backup plan) should be a contingency in the event that the people or group of people missed something or worse, were incompetent.  Simulate a failure, a failure you haven’t even contemplated, perhaps even set aside some time with folks just to brainstorm how to simulate the failure.

Lessons Learned for Customers of Cloud Providers

1) This failure was one of numerous types of failures that could have occurred, the bottom line is that you need to take responsibility for your data.

2) If your business is actually dependent on something you need to fully understand the risk of allowing someone else control over it.

The Verdict

The bottom line here is that codespaces.com does bear responsibility in this situation, though the fact of the matter is that the types of mistakes that were made are very easy mistakes to fall into even within the context of a seasoned IT professional.  However we need to be careful not to throw the baby out with the bath water.  The fact of the matter is that a crime was committed and that should be fully investigated and prosecuted, with as much recovered to codespaces.com customers as possible to make them whole again.  One final point is, that the folks who worked at codespaces.com have just gone through a very difficult process that is frankly once in a lifetime, that said they have learned lessons that your best guy hasn’t yet learned.  Do not hesitate to hire them because of this situation, I personally think that given a little bit of time to process what happened, they will be very valuable resources in terms of understanding how to ensure that this doesn’t happen again (in other words someone you want on your side if you utilize a cloud model, and frankly even if you don’t).  After all I can think of thousands of mistakes (big and small) that have taught me lessons, but I really can’t think of any successes that have taught me lessons.

A final thought to leave you with…  Comfort breeds more of the same; Pain breeds change.