Comprehensive Guide on Linux RAIDs with mdadm

2023-07-26

For a little bit more than the past 2 years I've been working on software RAID's at Intel. While I'm leaving Intel, I thought I could share my knowledge on RAIDs and preserve it for my future self.

Intel uses propertiary IMSM metadata format which you can read more about here. But I'm not gonna talk about it, because it's intended for enterprise and I'm not allowed to due to NDA's and such.

So I will talk about Linux RAIDs in general, using native Linux metadata. This post is going to be a quick and sweet introduction to Linux RAIDs aggregating all the knowledge wandering around that I could find on the topic, see References.

What is RAID?

RAID (Redundant Array of Independent Disks) is a storage solution that allows to combine multiple physical storage devices into one logical unit. Later in this post I will refer to is as RAID array or RAID volume.

Data on RAID arrays can be distributed in multiple ways. You can refer to them as RAID Levels.

In data distribution on RAIDs there are two important topics:

splitting the data onto multiple drives so that there are speed benefits,
data redundancy so that when any of the drives fails, no data is lost.

RAID levels

There are multiple RAID levels, with different characteristics.

Note
I'm gonna refer to RAID 0 and RAID 1 as two base RAID levels, because other RAID levels are kind of a combination of these two.

RAID 0

It is the first of the two base RAID levels, which is crucial in understanding how RAIDs work.

Only data striping is used. The data is split into blocks that are distributed over storage devices used to create the array.

RAID 0 chart

RAID 1

It is the second of the two base RAID levels. Doesn't have any speed improvements due to data distribution over multiple storage devices.

However every storage device has a perfect copy of the data. The data is mirrored over all used drives.

Note
The size of the RAID 0 array is equal to the size of the smallest array member.

RAID 1 chart

RAID 4

It is kind of a combination between RAID 1 and RAID 0. The pillars of RAIDs are achieved firstly by data striping similarly to RAID 0 (that's why we get speed gains).

However RAID 4 also allows for data redundancy by designating a special parity drive. It contains parity blocks calculated from the data stored on other drives.

RAID 4 chart

Note
The parity blocks are nothing different than XORs of the data blocks contained in the array. When one of the drives is lost we can use XOR operation on data blocks that are left along with parity blocks and recover the data.

Important
At minimum 3 drives are needed

RAID 5

Almost identical to RAID 4 with one crucial difference. The parity blocks are not stored on special parity drive, but rather distributed across all the drives in an array along with the data blocks.

RAID 5 chart

Important
At minimum 3 drives are needed

The distribution of parity blocks can be done in multiple ways.

RAID 6

It's similar to RAID 5, but goes a bit further with the idea of parity. While in RAID 5 when losing one drive, we don't lose the data, if we lose more than one drive, the data will be lost.

And for that scenario we have RAID 6. There will be no data loss even if two of the used drives fail. The actual way of computing the parity blocks is described in great detail on Igor Ostrovsky blog.

Important
At minimum 4 drives are needed

RAID 6 chart

RAID 10

It's a literal combination of RAID 0 and RAID 1 (hence the name RAID 10), where RAID 1 arrays are combined into RAID 0 array.

Important
Requires at least 4 drives

RAID 10 chart

Creating the RAID array

To manage RAIDs in Linux, mdadm tool can be used. It communicates with md (multiple devices) kernel driver and allows to create, manage and modify RAIDs.

Before starting please install mdadm package in your distro of choice.

Warning
Before following any instructions please backup your data.

To create the RAID 0 array following command can be used:

$ mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/sdd /dev/sde --run

# shortened
$ mdadm -CR /dev/md0 -l 0 -n 2 /dev/sd[d-e]

It will create RAID named md0 with level 0 and two devices /dev/sdd /dev/sde.

To check for the devices you could possibly use in your system you can use following command:

NAME        FSTYPE            FSVER LABEL                 UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                                           
└─sda1      ext4              1.0                         b0020384-55f4-48df-9f96-b6fb5a15e023   48.6G    76% 
sdb                                                                                                           
├─sdb1                                                                                                        
├─sdb2      ntfs                                          18F85AC5F85AA134                                    
└─sdb3      ext4              1.0                         33ca8a5d-f2de-47b1-8977-c6240c41561b                
sdc                                                                                                           
└─sdc1      ntfs                                          68E831D8E831A56A                       17.4G    96% 
sdd         linux_raid_member 1.2   fedora:0              5a7a91ff-1677-680d-acb8-da21d1f7a332                           
└─md0                                                                                                                    
sde         linux_raid_member 1.2   fedora:0              5a7a91ff-1677-680d-acb8-da21d1f7a332                           
└─md0                                                                                                                    
sdf                                                                                                           
sr0                                                                                                           
zram0                                                                                                         [SWAP]
nvme0n1                                                                                                       
├─nvme0n1p1 vfat              FAT32                       4D0A-3789                             581.4M     3% /boot/efi
├─nvme0n1p2 ext4              1.0                         ea5c584d-7de2-473a-a77d-bf93b18d451f  568.7M    35% /boot
└─nvme0n1p3 btrfs                   fedora_localhost-live c0fc34a9-7e29-4c17-9ffd-ae385d7f6c29                /var/lib/docker/btrfs
                                                                                                              /home
                                                                                                              /

As you can see in this case the drives sdd and sde were used for RAID 0. They got special linux_raid_member filesystem type.

To check whether our RAID 0 was created properly we can read /proc/mdstat file. It is a special file filled out by the kernel module md in realtime, and provides actual information on the created RAIDs in the system.

$ cat /proc/mdstat
Personalities : [raid0] 
md0 : active raid0 sde[1] sdd[0]
      22571520 blocks super 1.2 512k chunks
      
unused devices: <none>

As you can see the device md0 is created and active.

Personalities in above example mean available RAID levels handled by the system.

Any other RAID with any level can be created the same way as before:

$ mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdd /dev/sde /dev/sdf --run

# shortened
$ mdadm -CR /dev/md0 -l 5 -n 3 /dev/sd[d-f]

Just remember to pass the appriopriate number of drives.

When creating any of the RAIDs with parity you will encounter recovery progress bar in your /proc/mdstat file.

$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] 
md0 : active raid5 sdf[3] sde[1] sdd[0]
      15120384 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [>....................]  recovery =  0.1% (14736/7560192) finish=51.1min speed=2456K/sec
      
unused devices: <none>

As you read before, RAIDs with parity consist of parity blocks, either on one drive, or scattered across multiple drives. No matter how located, these parity blocks have to be calculated. And that's what this recovery means.

Removing RAID array

Stopping RAID array

To remove RAID firstly RAID array has to be stopped:

$ mdadm --stop /dev/md0

Frequently --stop option is used along with --scan option to stop all arrays.

$ mdadm -Ss

Assembling stopped RAID array

Sometimes you might want to just start array after stopping. To start the array once again --assemble option can be used.

$ mdadm --assemble /dev/md0 /dev/sd[d-f]

Frequently --scan option is used alongside --assemble:

$ mdadm --assemble --scan

# shortened
$ mdadm -As

It will assemble all arrays that can be assembled.

Resetting RAID drives

After stopping array disks can be reset:

Warning
It will destroy your data.

$ mdadm --zero-superblock /dev/sdd /dev/sde

# or shortened
$ mdadm --z /dev/sdd /dev/sde

Grow operations

When using RAID at some point you might want to add an additional drive. To do this (and any other changes on the working array) grow operation can be used.

Additional drive

Let's use the 2 disk RAID 1 array.

mdadm -CR /dev/md0 -l 1 -n 2 /dev/sd{d,f}

$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid1] 
md0 : active raid1 sdf[1] sdd[0]
      7561472 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.7% (58240/7561472) finish=34.3min speed=3640K/sec
      
unused devices: <none>

Wait for resync to end.

Note
--assume-clean option can be used to tell mdadm that resync is not needed

Adding spare drive

Firstly --add operation has to be done to add drive to an array.

$ mdadm --add /dev/md0 /dev/sde

It will be added as a spare drive.

Note
spare drive is a drive that will be taken into an array in case of a failure of any of the drives being already in an array

$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid1] 
md0 : active raid1 sde[2](S) sdf[1] sdd[0]
      7560640 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

Spare drives are denoted by (S).

Grow operation

After adding spare actual grow operation can be performed.

$ mdadm --grow /dev/md0 --raid-devices=3

Then you'll just have to wait for recovery to end.

$ cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid1] 
md0 : active raid1 sde[2] sdf[1] sdd[0]
      7560640 blocks super 1.2 [3/2] [UU_]
      [=====>...............]  recovery = 26.8% (2029696/7560640) finish=104.8min speed=878K/sec
      
unused devices: <none>

Changing RAID level

Another operation that can be done is RAID level migration.

Let's change our RAID 1 into RAID 0. No additional devices are needed, so a simple grow command is enough.

$ mdadm --grow /dev/md0 --level=0

Contributing to Linux RAID

Git repositories are publicly available:

https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git

Any changes to the source code or questions can be sent to linux-raid mailing list.

Reference

https://wiki.archlinux.org/title/RAID

https://raid.wiki.kernel.org/index.php/A_guide_to_mdadm

https://en.wikipedia.org/wiki/RAID

https://raid.wiki.kernel.org/index.php/Growing

Diagrams made by: Cburnett