Replacing a drive in a ZFS array

I have a ZFS RAIDZ2 array in my main server and recently it suffered a drive failure. Naturally I was woefully under prepared for this occurrence so I rushed out and bought a replacement drive and got cracking with learning how to safely replace the drive.

Background

If you don’t already have one you need a ZFS array to play with. At least the first part of this article will be based around the test array I built in this article.

Replacing a Disk

Replacing a disk is quite simple if you have a space space in your system for a replacement drive, it’s slightly move involved if there are no spare spaces.

Start by using zpool status to show you which devices are present in your pool. This is important. You don’t want to accidentality work with the wrong device and it’s easily done when the devices are matched.

zpool status
  pool: tank
 state: ONLINE
config:

        NAME                                           STATE     READ WRITE CKSUM
        tank                                           ONLINE       0     0     0
          raidz1-0                                     ONLINE       0     0     0
            ata-ST3160812AS_5LS9ADML                   ONLINE       0     0     0
            ata-WDC_WD3200AAKX-001CA0_WD-WCAYUL400860  ONLINE       0     0     0

errors: No known data errors

Since you added devices by ID you’ll get the whole ID of the device as the name.

Add the new drive to the system and then run fdisk to find all the known devices:

fdisk -l

... snip ...

Disk /dev/sdd: 149.01 GiB, 160000000000 bytes, 312500000 sectors
Disk model: ST3160812AS     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2F90C290-3476-EE4C-9647-E00D10BF4DD1

Device         Start       End   Sectors  Size Type
/dev/sdd1       2048 312481791 312479744  149G Solaris /usr & Apple ZFS
/dev/sdd9  312481792 312498175     16384    8M Solaris reserved 1

I’ve snipped out the devices that were present in the base system which is detailed in the previous article and only show the newly added device here. The new device is another 160GB drive, exactly the same model as the one already in the pool. This new device already has a file system on it so it needs wiping. The filesystem present is an old ZFS pool I created when I was last learning this. List devices by ID to find the ID of the new drive. Take care here to pick the correct device. Triple check that the device you are working with is not in the existing pool!

ls -l /dev/disk/by-id/

total 0
lrwxrwxrwx 1 root root  9 Jul 29 09:43 ata-ST2000DL003-9VT166_5YD1KN65 -> ../../sda
lrwxrwxrwx 1 root root 10 Jul 29 09:43 ata-ST2000DL003-9VT166_5YD1KN65-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jul 29 09:43 ata-ST2000DL003-9VT166_5YD1KN65-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jul 29 09:43 ata-ST2000DL003-9VT166_5YD1KN65-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Jul 29 12:54 ata-ST3160812AS_5LS92RAL -> ../../sdd
lrwxrwxrwx 1 root root 10 Jul 29 12:54 ata-ST3160812AS_5LS92RAL-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Jul 29 12:54 ata-ST3160812AS_5LS92RAL-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 Jul 29 11:40 ata-ST3160812AS_5LS9ADML -> ../../sdb
lrwxrwxrwx 1 root root 10 Jul 29 11:40 ata-ST3160812AS_5LS9ADML-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jul 29 11:40 ata-ST3160812AS_5LS9ADML-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 Jul 29 11:40 ata-WDC_WD3200AAKX-001CA0_WD-WCAYUL400860 -> ../../sdc
lrwxrwxrwx 1 root root 10 Jul 29 11:40 ata-WDC_WD3200AAKX-001CA0_WD-WCAYUL400860-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Jul 29 11:40 ata-WDC_WD3200AAKX-001CA0_WD-WCAYUL400860-part9 -> ../../sdc9

... snip ...

The newly added device is “ata-ST3160812AS_5LS92RAL”. I can tell this because I know that “ata-ST2000DL003-9VT166_5YD1KN65” is my OS drive and “ata-ST3160812AS_5LS9ADML” and “ata-WDC_WD3200AAKX-001CA0_WD-WCAYUL400860” are listed in the pool output.

Issue a wipe command on the new device:

wipefs -a /dev/disk/by-id/ata-ST3160812AS_5LS92RAL

/dev/disk/by-id/ata-ST3160812AS_5LS92RAL: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/disk/by-id/ata-ST3160812AS_5LS92RAL: 8 bytes were erased at offset 0x2540be3e00 (gpt): 45 46 49 20 50 41 52 54
/dev/disk/by-id/ata-ST3160812AS_5LS92RAL: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/disk/by-id/ata-ST3160812AS_5LS92RAL: calling ioctl to re-read partition table: Success

Running fdisk -l will now show no filesystem on the device but more importantly use zpool status to check that the pool is still good, the output should exactly match what is shown above.

Now issue a replace command to replace the existing 320GB Western Digital drive with the new drive. The new drive is smaller than the drive it’s replacing which is usually not allowed but this should work here because the array was originally limited to 160GB as it was built with mismatched drives.

zpool replace tank /dev/disk/by-id/ata-WDC_WD3200AAKX-001CA0_WD-WCAYUL400860 /dev/disk/by-id/ata-ST3160812AS_5LS92RAL

The command will return nothing if it works. Running zpool status will now show the pool made up of the two Seagate drives.

zpool status

  pool: tank
 state: ONLINE
  scan: resilvered 1.24M in 00:00:01 with 0 errors on Mon Jul 29 14:11:08 2024
config:

        NAME                          STATE     READ WRITE CKSUM
        tank                          ONLINE       0     0     0
          raidz1-0                    ONLINE       0     0     0
            ata-ST3160812AS_5LS9ADML  ONLINE       0     0     0
            ata-ST3160812AS_5LS92RAL  ONLINE       0     0     0

errors: No known data errors

Notice that it reports that the drive was resilvered and that it took just one second to do that, in a real system that resilvering will take much longer usually, this pool only has a tiny amount of data on it. The Western Digital 320GB drive can now be removed from the system. Running zpool status again after removale should show the pool as being fine. Obviously the removed drive will no longer appear in fdisk etc.

Running the disk replacement on my main server gives output like this while the resilvering is taking place:

zpool status

  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Jul 29 14:56:44 2024
        237G / 38.1T scanned at 19.7G/s, 0B / 38.1T issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                     STATE     READ WRITE CKSUM
        tank                                     ONLINE       0     0     0
          raidz2-0                               ONLINE       0     0     0
            ata-ST16000NM001G-2KK103_ZL2PSV35    ONLINE       0     0     0
            ata-ST16000NM001G-2KK103_ZL2ATVC1    ONLINE       0     0     0
            ata-ST16000NM001G-2KK103_ZL2PSG9B    ONLINE       0     0     0
            ata-ST16000NM001G-2KK103_ZL22R6W2    ONLINE       0     0     0
            replacing-4                          ONLINE       0     0     0
              ata-ST16000NM001G-2KK103_ZL21R7LK  ONLINE       0     0     0
              ata-ST16000NM001G-2KK103_ZL2EPHQB  ONLINE       0     0     0

errors: No known data errors

With 38T to resilver it’s going to take a little bit more time than the one second on the test system (probably about 24 hours). I’m not quite sure what the replacing-4 means. I think ZFS keeps track of the devices by a number and it’s referring to the fourth in the array.

Further Thoughts

While resilvering the array I got to thinking about how the array functions when it has a working disk that is being replaced. Asking on Reddit it seems that it’s better if you can leave the failing disk in the array while it’s resilvering. Apparently it makes the resilvering faster and can make it safer. See here. Additionally, this means it’s always a good idea to have a space free for a replacement disk to go in.