Comprehensive Guide on Linux RAIDs with mdadm
For a little bit more than the past 2 years I've been working on software RAID's at Intel. While I'm leaving Intel, I thought I could share my knowledge on RAIDs and preserve it for my future self.
Intel uses propertiary IMSM metadata format which you can read more about here. But I'm not gonna talk about it, because it's intended for enterprise and I'm not allowed to due to NDA's and such.
So I will talk about Linux RAIDs in general, using native Linux metadata. This post is going to be a quick and sweet introduction to Linux RAIDs aggregating all the knowledge wandering around that I could find on the topic, see References.
What is RAID?
RAID (Redundant Array of Independent Disks) is a storage solution that allows to combine multiple physical storage devices into one logical unit. Later in this post I will refer to is as RAID array or RAID volume.
Data on RAID arrays can be distributed in multiple ways. You can refer to them as RAID Levels.
In data distribution on RAIDs there are two important topics:
- splitting the data onto multiple drives so that there are speed benefits,
- data redundancy so that when any of the drives fails, no data is lost.
RAID levels
There are multiple RAID levels, with different characteristics.
Note
I'm gonna refer to RAID 0 and RAID 1 as two base RAID levels, because other RAID levels are kind of a combination of these two.
RAID 0
It is the first of the two base RAID levels, which is crucial in understanding how RAIDs work.
Only data striping is used. The data is split into blocks that are distributed over storage devices used to create the array.
RAID 1
It is the second of the two base RAID levels. Doesn't have any speed improvements due to data distribution over multiple storage devices.
However every storage device has a perfect copy of the data. The data is mirrored over all used drives.
Note
The size of the RAID 0 array is equal to the size of the smallest array member.
RAID 4
It is kind of a combination between RAID 1 and RAID 0. The pillars of RAIDs are achieved firstly by data striping similarly to RAID 0 (that's why we get speed gains).
However RAID 4 also allows for data redundancy by designating a special parity drive. It contains parity blocks calculated from the data stored on other drives.
Note
The parity blocks are nothing different than
XORs
of the data blocks contained in the array. When one of the drives is lost we can useXOR
operation on data blocks that are left along with parity blocks and recover the data.
Important
At minimum 3 drives are needed
RAID 5
Almost identical to RAID 4 with one crucial difference. The parity blocks are not stored on special parity drive, but rather distributed across all the drives in an array along with the data blocks.
Important
At minimum 3 drives are needed
The distribution of parity blocks can be done in multiple ways.
RAID 6
It's similar to RAID 5, but goes a bit further with the idea of parity. While in RAID 5 when losing one drive, we don't lose the data, if we lose more than one drive, the data will be lost.
And for that scenario we have RAID 6. There will be no data loss even if two of the used drives fail. The actual way of computing the parity blocks is described in great detail on Igor Ostrovsky blog.
Important
At minimum 4 drives are needed
RAID 10
It's a literal combination of RAID 0 and RAID 1 (hence the name RAID 10), where RAID 1 arrays are combined into RAID 0 array.
Important
Requires at least 4 drives
Creating the RAID array
To manage RAIDs in Linux, mdadm tool can be used. It communicates with md (multiple devices) kernel driver and allows to create, manage and modify RAIDs.
Before starting please install mdadm
package in your distro of choice.
Warning
Before following any instructions please backup your data.
To create the RAID 0 array following command can be used:
$ mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/sdd /dev/sde --run
# shortened
$ mdadm -CR /dev/md0 -l 0 -n 2 /dev/sd[d-e]
It will create RAID named md0
with level 0
and two devices /dev/sdd
/dev/sde
.
To check for the devices you could possibly use in your system you can use following command:
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
└─sda1 ext4 1.0 b0020384-55f4-48df-9f96-b6fb5a15e023 48.6G 76%
sdb
├─sdb1
├─sdb2 ntfs 18F85AC5F85AA134
└─sdb3 ext4 1.0 33ca8a5d-f2de-47b1-8977-c6240c41561b
sdc
└─sdc1 ntfs 68E831D8E831A56A 17.4G 96%
sdd linux_raid_member 1.2 fedora:0 5a7a91ff-1677-680d-acb8-da21d1f7a332
└─md0
sde linux_raid_member 1.2 fedora:0 5a7a91ff-1677-680d-acb8-da21d1f7a332
└─md0
sdf
sr0
zram0 [SWAP]
nvme0n1
├─nvme0n1p1 vfat FAT32 4D0A-3789 581.4M 3% /boot/efi
├─nvme0n1p2 ext4 1.0 ea5c584d-7de2-473a-a77d-bf93b18d451f 568.7M 35% /boot
└─nvme0n1p3 btrfs fedora_localhost-live c0fc34a9-7e29-4c17-9ffd-ae385d7f6c29 /var/lib/docker/btrfs
/home
/
As you can see in this case the drives sdd
and sde
were used for RAID 0. They got special linux_raid_member
filesystem type.
To check whether our RAID 0 was created properly we can read /proc/mdstat
file. It is a special file filled out by the kernel module md
in realtime, and provides actual information on the created RAIDs in the system.
$ cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sde[1] sdd[0]
22571520 blocks super 1.2 512k chunks
unused devices: <none>
As you can see the device md0 is created and active
.
Personalities
in above example mean available RAID levels handled by the system.
Any other RAID with any level can be created the same way as before:
$ mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdd /dev/sde /dev/sdf --run
# shortened
$ mdadm -CR /dev/md0 -l 5 -n 3 /dev/sd[d-f]
Just remember to pass the appriopriate number of drives.
When creating any of the RAIDs with parity you will encounter recovery
progress bar in your /proc/mdstat
file.
$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md0 : active raid5 sdf[3] sde[1] sdd[0]
15120384 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[>....................] recovery = 0.1% (14736/7560192) finish=51.1min speed=2456K/sec
unused devices: <none>
As you read before, RAIDs with parity consist of parity blocks, either on one drive, or scattered across multiple drives. No matter how located, these parity blocks have to be calculated. And that's what this recovery
means.
Removing RAID array
Stopping RAID array
To remove RAID firstly RAID array has to be stopped:
$ mdadm --stop /dev/md0
Frequently --stop
option is used along with --scan
option to stop all arrays.
$ mdadm -Ss
Assembling stopped RAID array
Sometimes you might want to just start array after stopping. To start the array once again --assemble
option can be used.
$ mdadm --assemble /dev/md0 /dev/sd[d-f]
Frequently --scan
option is used alongside --assemble
:
$ mdadm --assemble --scan
# shortened
$ mdadm -As
It will assemble all arrays that can be assembled.
Resetting RAID drives
After stopping array disks can be reset:
Warning
It will destroy your data.
$ mdadm --zero-superblock /dev/sdd /dev/sde
# or shortened
$ mdadm --z /dev/sdd /dev/sde
Grow operations
When using RAID at some point you might want to add an additional drive. To do this (and any other changes on the working array) grow
operation can be used.
Additional drive
Let's use the 2 disk RAID 1 array.
mdadm -CR /dev/md0 -l 1 -n 2 /dev/sd{d,f}
$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sdf[1] sdd[0]
7561472 blocks super 1.2 [2/2] [UU]
[>....................] resync = 0.7% (58240/7561472) finish=34.3min speed=3640K/sec
unused devices: <none>
Wait for resync to end.
Note
--assume-clean
option can be used to tell mdadm that resync is not needed
Adding spare drive
Firstly --add
operation has to be done to add drive to an array.
$ mdadm --add /dev/md0 /dev/sde
It will be added as a spare drive.
Note
spare drive is a drive that will be taken into an array in case of a failure of any of the drives being already in an array
$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sde[2](S) sdf[1] sdd[0]
7560640 blocks super 1.2 [2/2] [UU]
unused devices: <none>
Spare drives are denoted by (S)
.
Grow operation
After adding spare actual grow operation can be performed.
$ mdadm --grow /dev/md0 --raid-devices=3
Then you'll just have to wait for recovery to end.
$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sde[2] sdf[1] sdd[0]
7560640 blocks super 1.2 [3/2] [UU_]
[=====>...............] recovery = 26.8% (2029696/7560640) finish=104.8min speed=878K/sec
unused devices: <none>
Changing RAID level
Another operation that can be done is RAID level migration.
Let's change our RAID 1 into RAID 0. No additional devices are needed, so a simple grow command is enough.
$ mdadm --grow /dev/md0 --level=0
Contributing to Linux RAID
Git repositories are publicly available:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git
Any changes to the source code or questions can be sent to linux-raid
mailing list.
Reference
https://wiki.archlinux.org/title/RAID
https://raid.wiki.kernel.org/index.php/A_guide_to_mdadm
https://en.wikipedia.org/wiki/RAID
https://raid.wiki.kernel.org/index.php/Growing
Diagrams made by: Cburnett