Table of contents of the article:
When you write on a corporate blog you always tend to have that pun in a corporate / institutional style, telling everything and nothing with cold and really not very compelling words. Unlike the big American corporations and the small companies that mimic the big corporations, we want to share in an empathic way the professional experiences of real life that will allow you to live real situations in the third person in order to learn from the experiences of others. Because in a world made up of companies, customers and suppliers and mere VAT numbers, there are stories of people who deserve to be told.
Today I want to talk to you about backup and data security telling you what I have and have to say about an all too underestimated problem.
If something can go wrong, it will go wrong.
This pseudo-scientific axiom better known as Murphy's law was the cornerstone of the organizational processes regarding the management of security and backups.
With this awareness in the last 6 years we have evaluated and implemented robust and proven backup solutions to ensure integrity of customer data.
Operating with the old Dreamsnet.it brand since 2005 and with previous systemic experiences since 2000, up to now we had always used an incremental snapshot backup solution that allowed a selective restore of both individual files and entire images. At the database level, for example, we used (and still use) tools that allow you to take snapshots quickly and respecting the logic of hot backups and integrity, that is to backup the DB without having to turn it off and making it always available the service even if it happens at 4 am (maybe the users are asleep, but the search engines index).
In short, not exactly the last of the class, if we think that today we have famous Italian hosting that still use the ancient and ancestral mysqldump. Really unbelievable.
In thirteen long years, in short, and after several hundred restored backups, no customer has ever lost a single file of his projects.
In the summer of 2018, something new, unexpected and upsetting happened that would forever question ourselves and anyone else after that ugly misadventure.
It was a sunny day, warm but not muggy, short-sleeved T-shirt and shorts with the notebook over the shoulder, I had just finished lunch at the river park that borders the house in Arad in Romania. I remember I was walking back in office at the Arad Business Center, along the beautiful bike path of the city very popular at that time by all those who like me were on their lunch break.
The view towards the river in full tranquility inspired peace and tranquility, the sight of people relaxed on the benches conveyed calm and positivity, I was literally immersed in that moment of bliss when the mobile phone starts ringing who was hooking up a call forwarding directly from the main office.
Return to planet earth, the one made of other people's problems to solve (after all, work generally has this purpose, right?) and I respond with a warm response
Hello, I'm Marco from Managed Server, how can I help you?
A male voice replies, it was a male, a boy roughly my age, between 30 and 35 I would have estimated by ear. We immediately overcome the formalities of the Italian language and put ourselves at ease by calling each other immediately (by the way, on the net and on social networks we call each other TU says Netiquette did you know?)
Start with a bit of agitation to talk to me about Backup, restore restore, failed restore, lost data and data recovery. Too many abstract and confusing concepts, too many fragmented inputs, and a lot of confusion in my head. I do not understand.
Who is the one who is phoning me? Is he our customer?
What happened?
Talk about backups, but do you need to restore a backup? Lost a backup? Don't have a backup?
Could it be one of those kids who can't tell the difference between a washing machine and a VCR?
I immediately stop this flood of random terms and phrases and ask you to calmly explain everything that has happened from the beginning, and ask me what your needs were.
Let's start again much more calmly and finally establish a discussion made up of sentences that make sense and above all logically connected.
In simple terms
This guy, hosted by a well-known French Hosting company, said that the night before he had made a mistake with the production of some FTP files and taken by frustration in solving the problem he had decided to make a "clean sweep", or delete everything And restore a backup to the day before.
Therefore, he is about to connect with his FTP client, delete all the folders of his site and once finished, from the customer area, execute the backup restore procedure from the comfortable Web interface.
Four clicks and you're done!
After a few minutes of progress of the task, a message announced the completion of the restore operation.
Oh, joy, glee and jubilation! What better news?
What happened is easy to understand and as you have already guessed, the backup was damaged or rather EMPTY. The restore wizard had restored exactly ZERO FILES leaving the destination directory completely empty, instead of the files from the previous day.
He doesn't get discouraged, doesn't give up, and tries again. Maybe maybe something went wrong.
Same procedure, same archive, same restore message. Then you check with the FTP client and… EMPTY. Again no file. Nothing, Nada, zero, nisba.
Return to the interface and select the backup of the day before, same restore procedure. Same successful completion message, same result. EMPTY.
Select the backup of the day before the day before. Restore. EMPTY
And so on with the backup of the day before the day before the day before yet. 3 days before, 4 days before, 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, XNUMX days before.
Same result: EMPTY.
The archives to be restored had run out, and the conclusion was only one: none of the 30 backups up to 30 days before had been possible to restore. This person had no way to restore backups of him, site of him feeding him.
The problem had become something really big as it was apparently without solution.
At that moment I felt a mix of emotions between them. Sadness mixed with joy.
You know when you experience some bad event directly or indirectly in which out of empathy you feel sadness about what happened but deep down also a pinch of joy in knowing that that bad thing didn't happen to you? Here, just that feeling, that mix of emotions at the antipodes, which collide with each other.
I was happy that he was not a customer of ours, I would not have been able to face such a situation in which the backup service exists but none of the archives can be restored. What could I have said to a PERSON who lives on the site and feeds their family who was on the phone with me at the time?
I immediately thought that this moral ethical problem, that sense of responsibility would not have affected the huge hosting company of which this person was a customer in the least.
In the end it was clear how the story would go: this person would have spoken to the telephone support, some guy put there a few coins a month, would have forwarded the problem to the Italian technical department who would have forwarded it to the French one and one of his technicians . The technician would probably have noticed that something was wrong and would have sent the message back to the Italian technical support who would probably have stated the lack of data and possibly asked for a refund or open a legal dispute.
Legal dispute. In Italy. Already here it would be funny, if it were not the tragic situation. Instead, it makes you cry when you think back to the stipulation of a contract and an SLA (written in such tiny fonts that they should be illegal), in which the supplier company indemnifies itself from any damage of this kind and any request for compensation.
It would have been easier to climb Everest with bare hands. (Okay, the one in the picture is not Everest, but it gives the idea).
The CEO of the company would never even know about the affair of this father of a family desperate for having lost 5 years of work. One of the many among millions of customers, a drop of the ocean. What do you want a problem like this to be for a company that makes hundreds and hundreds of millions of revenue a year with millions and millions of customers? Absolutely insignificant. Unfortunately.
What could we do?
Nothing. Faced with that complex situation and absolutely beyond mine and our margin of maneuver (the supplier was another, we didn't know what backup system he used, what the storage system could be, how to have access to the media, and why had this bad mishap happened), what could we do?
The most correct thing to do was obviously to ask for a comparison with their technical support to understand if there was any possibility of recovery, and in the meantime to look if CASOMAI had a local backup on your PC maybe a few days or months before.
The first suggestion did not give any positive results, in fact the support limited itself to saying that there was no file within the backup that had been done correctly anyway. This affirmation was also confirmed by their top-level French technical support after about two weeks who simply liquidated with a "there is nothing“, An absolutely grotesque and unpleasant situation.
A little better instead for what concerns the advice to look for a backup locally, since actually with a lot of dedication and various rummaging through the files of his PC he managed to find and restore a backup of a few months before, which although it was not was the optimal situation of how to repair such damage, however, it allowed this person to get back on track and avoid bankruptcy inevitable if this were not the case.
All's well that ends well!
The lesson we have learned
In all this ugly history, our role has been absolutely spectator. Absolutely irrelevant, in short, for the outcome of the story fortunately resolved if not for the best, certainly not for the worst.
All this, however, gave us the opportunity to learn some important considerations hitherto underestimated. Essentially we asked ourselves the following questions:
1. Why can't what happened to them happen to us?
Why should a backup system fail only happen to other hosting providers and not to us?
It would be hypocritical right? It's like saying that we could easily avoid not wearing seat belts when we drive because accidents happen only to others, don't they? Instead, the most correct and appropriate answer to the case is that until then, that memorable summer day, the only reason it didn't happen to us was chance, pure and simple luck. In short, the fact that a very advanced snapshot backup system has never generated corrupted backups. Certain remote eventuality, improbable, but not impossible as we saw on that very day.
2. How much does data loss affect people's lives?
Losing a website or data can spell the end of a business. This means creating an economic damage on people's lives, probably putting them in a position not to be able to afford goods of primary necessity. This cannot and must not happen, at least it must not happen through our fault, neither as a cause nor as a contributing cause.
3. How much does data loss affect our company?
The loss of a customer's data can most likely mean a legal dispute. It is irrelevant to be wrong or right or to go and see the contracts for the various indemnities and SLAs signed, there are clear legal obligations such as those imposed by the new European GDPR law that would see us accused of various omissions and therefore condemned to fines and compensation. . Better to invest in data security by allocating 10% of the turnover on monitoring systems, RAID systems, redundant storage, multiple backups, rather than risking courts, litigation, compensation and fines.
What did we do then?
Having understood the three points above loud and clear, we “simply” added a secondary one to the already functioning backup system, in turn redundant on a C14 data storage system with military grade certification in France.
In short, if before on our RAID1 systems we had only one backup that flowed into a storage area in RAID5, today we have three backups with two different technologies that converge on three different RAID5 storage systems and in turn one of these is mirrored on a service from C14 military grade secure anti-nuclear storage redundant in France.
In short, in this modus operandi, an unfortunate customer who needs to restore a backup can rely on a remote backup which, if it doesn't work (a very remote hypothesis), could rely on the second backup system, which in turn, if it doesn't work ( hypothesis itself very remote) could rely on mirroring with Rsync on another storage.
If there is a major attack (let's assume a nuclear explosion) on the datacenter, we would still have a weekly mirroring on the C14 anti-atomic storage in France.
In short, the possibility of losing customer data has really been reduced to an almost impossible limit. More than these precautions do not currently exist to our knowledge in Italy but also internationally, seen and ascertained that even very large companies with turnovers of hundreds of millions of euros continue to use a single and unique backup as a definitive solution to protect themselves itself and its customers.
In terms of internal policies (as we have always done) we continued to offer the Backup service included in the offer. It does not exist that it can be an additional value-added service with a separate price. THEWe offer backup included, it will be our care to protect customers in the best possible way, regardless of their optimism that nothing will ever happen to their data, or that it won't help.
In short, atomic bomb proof!