Once upon a time, I was hired as the second engineer in a startup.
On the first day, my task was to create a script to anonymize the data in a dump of the production db so we could use the anonymized data on staging and for local development.
We had not that much data in our production Postgres database so creating a dump and running the anonymization script took just 2-3 minutes. After writing the anonymization script I wanted to get a fresh dump of the production db to install an anonymized version of it on staging and my local dev environment.
All this was still a very manual process. I had multiple terminals open, one for each database (production, staging, and my local dev database) and one or two terminals for copying the database dumps around, and running the anonymization script.
I created a dump of the production database in one terminal and ran the anonymization script in another one. In a third terminal, I dropped the existing staging database and loaded the production dump onto the staging server.
I switched terminal windows again and dropped my local database to load the same production dump into my local dev machine. I switched to the terminal that I used to load the dump into staging and changed the command to load the dump into the local database. But the command failed with "database already existing".
My heart skipped a beat: What database did I just drop? I switched back to the terminal where I dropped the database a few moments ago. This was not the connection to my local Postgres server but to the production database server! I just dropped our production database...
My eyes focused on the black terminal window and everything around it became a bit blurred.
My phone rang. It was the founder of the startup. I thought: "Does he know?".
I picked up the phone and he told me that a customer called him and that they could not log into our service and if I could take a look.
"No problem, I'll take a look". I hung up the phone.
I took a couple of deep breaths. "I still have the dump of the production db from five minutes ago. It is not a total catastrophe."
I double and dribble-checked what terminal I was in and what dump I was about to restore. I restored our production database from the dump. After verifying that the login was working again I informed the founder that everything was working again and that it was a small glitch with the database. Fortunately, our system was built in a way that there was no data loss just a couple of minutes of downtime. Phew!
After that, I had to lie down for a couple of minutes.
Later, I built a system where an AWS Lambda function wakes up every night and creates an anonymized dump of the production database. We also added a Slack bot that every developer could ask for a download link of the latest anonymized data dump (the link was valid for 3 minutes and then expired)
No developer needs access to the production database, or real production data for doing a good job.
We made it hard to screw things up and easy to do the right thing.
This became our general philosophy in the startup on how to set things up for success. And it kind of worked. (at least the startup is still around)
PS: To this day, I close terminal windows to important things as soon as possible to never make such a mistake again.
댓글