Fixing the RAID Array

One of the main things I wanted when I reinstalled the server was to take advantage of the RAID features which are part of the Ubuntu installation process. Most desktop computers only have space for one hard drive, and seeing as this server would be used for important files such as family files, business data, photos and home movies, it had to be resilient enough not to fail for predictable reasons. This meant setting up a second hard drive and configuring RAID.

By removing your DVD disk drive (your not going to need that much on a server) you can replace it with a hard drive caddy tray and 2.5 inch disk drive to give yourself a second storage device. I got a 1Tb HDD, as 1Tb solid state drives are still quite expensive. When installing Ubuntu 18.04, I also chose to set up RAID as part of the process. It is one thing to set up RAID, it is another to test it and prove that it is working though.

Recently I was looking up how to monitor the array and see if a drive had ever failed. Running fdisk -l showed me the two disks and the partitions set up on them and lsblk also showed the disks to be present and correct, or so I thought. It was when I ran the RAID specific commands that something strange started to show.

mdadm is the utility used to manage software RAID arrays and has a wealth of commands for getting details and managing disks and arrays. My array partition is called /dev/md0, and so running sudo mdadm –detail /dev/md0 gave a lot of information, and what stood out was that one of the drives was marked as ‘removed’. I then ran cat /proc/mdstat and this showed one drive was not active in the array, as indicated by the [_U].

To check if there was a problem with the disk, i.e. had it failed at some point, I installed and ran sudo smartctl -a /dev/sda. The server wasn’t showing a problem on the drive itself, but it looked like the disk somehow wasn’t part of the array. Whether that meant that something had happened to remove it from the array or it wasnt set up correctly to begin with I don’t know, but it looked like the drive simply needed to be added to the array. BEFORE doing anything with the disks though, I ran my backup script so all important files were copied up to the cloud (Amazon AWS). Better safe than sorry.

I then ran mdadm –manage /dev/md0 –add /dev/sda2 and was pleased to see the array starting to rebuild itself.

After not too long, and some checking in, the array looks to be fully built and set up. In the next few days I want to go up to the attic, detach each drive and test booting it up, to confirm that the content is indeed mapped across each drive and it can sustain failure of one disk.

Another important step is to set up your email so, if a drive does fail, the server sends you email notification. No point in having a drive die and you not know about it. I tried doing this but it requires postfix be configured on the server. I’m having a little trouble with this at the moment, so once I get it done correctly I will write another post.

Many thanks to the community at askubuntu.com for their help.

How your array should look