The chapter discusses the following:
Filesystems can become fragmented over time. When a filesystem is fragmented, blocks of free space are small and files have many extents. The xfs_fsr command reorganizes filesystems so that the layout of the extents is improved. This improves overall performance. See the xfs_fsr(8) man page for more information.
Most often, a filesystem is corrupted because the system experienced a panic. This can be caused by system software failure, hardware failure, or human error (for example, pulling the plug). Another possible source of filesystem corruption is overlapping partitions.
There is no foolproof way to predict hardware failure. The best way to avoid hardware failures is to conscientiously follow recommended diagnostic and maintenance procedures.
Human error is probably the greatest single cause of filesystem corruption. To avoid problems, follow these rules closely:
Always shut down the system properly. Do not simply turn off power to the system. Use a standard system shutdown tool, such as the shutdown(8) command.
Never remove a filesystem physically (never pull out a hard disk) without first turning off power.
Never physically write-protect a mounted filesystem, unless it is mounted read-only.
Do not mount filesystems on dual-hosted disks on two systems simultaneously.
The best way to ensure against data loss is to make regular, careful backups.
In some cases, XFS filesystem corruption, even on the root filesystem, can be repaired with the command xfs_repair. For more information about xfs_repair(8) see the man page and “Checking Filesystem Consistency”
This section discusses the following:
You can use the following commands to check the consistency of a filesystem:
xfs_repair -n (no-modify mode)
xfs_check
Unlike fsck, neither xfs_check nor xfs_repair are invoked automatically on system startup. You should use these commands if you suspect a filesystem consistency problem.
The xfs_repair -n command checks XFS filesystem consistency without making any attempt to repair problems. It performs a more complete check than xfs_check. However, you can use xfs_check on filesystems with extended attributes. (xfs_repair performs only limited checking of extended attributes.) For more information about extended attributes, see the attr(1) man page.
The filesystem to be checked must have been unmounted cleanly using normal system administration procedures (the umount command or system shutdown), not as a result of a crash or system reset. If the filesystem has not been unmounted cleanly, mount it and unmount it cleanly before running xfs_check or xfs_repair -n.
Caution: If you suspect problems with the root filesystem, you should use a boot disk or an alternate root to run xfs_repair. |
The command line for xfs_repair -n is:
# xfs_repair -n device |
device is the device file for a disk partition or logical volume that contains an XFS filesystem, such as /dev/xscsi/pci02.02.0-1/target3/lun0/part1
The following example shows output with no consistency problems found:
# xfs_repair -n /dev/xscsi/pci02.02.0-1/target3/lun0/part1 Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 ... - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 ... No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. |
For information about potential errors, see “Common xfs_repair Error Messages”.
For more details, see the xfs_repair(8) man page.
The command line for xfs_check is:
# xfs_check device |
device is the disk or volume device for the filesystem.
If no consistency problems were found, xfs_check returns without displaying any output, as shown in the following example:
# xfs_check /dev/xscsi/pci02.02.0-1/target3/lun0/part1 # |
If a problem is reported, use xfs_repair -n to obtain more information. See “xfs_repair -n”.
For more information, see the xfs_check(8) man page.
The xfs_repair command checks XFS filesystem consistency and sometimes repairs problems that are found. This section discusses the following:
Caution: If you suspect problems with the root filesystem, you should use a boot disk or an alternate boot disk to run xfs_repair. |
The xfs_repair (without the -n option) checks XFS filesystem consistency and, if problems are detected, also corrects them if possible. The filesystem to be checked and repaired must have been unmounted cleanly using normal system administration procedures (the umount command or system shutdown), not as a result of a crash or system reset. If the filesystem has not been unmounted cleanly, mount it and unmount it cleanly before running xfs_repair.
The command line for xfs_repair when you want it to repair any inconsistencies it finds is:
# xfs_repair device |
device is the disk or volume device for the filesystem. It must not be mounted.
The following example shows the output you see from running xfs_repair on a clean filesystem:
# xfs_repair /dev/xscsi/pci02.02.0-1/target3/lun0/part1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 ... - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 ... Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done |
Some common error messages from xfs_repair and the repairs that it performs are the following:
If xfs_repair has put files and directories in a filesystem's lost+found directory and you do not remove them, the next time you run xfs_repair it temporarily disconnects the inodes for those files and directories. They are reconnected before xfs_repair terminates. As a result of the disconnected inodes in lost+found, you see output like this:
Phase 1 - find and verify superblock... Phase 2 - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 ... - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - clearing existing “lost+found” inode - deleting existing “lost+found” entry - check for inodes claiming duplicate blocks... - agno = 0 imap claims in-use inode 242000 is free, correcting imap - agno = 1 - agno = 2 ... Phase 5 - rebuild AG headers and trees... - reset superblock counters... Phase 6 - check inode connectivity... - ensuring existence of lost+found directory - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... disconnected inode 242000, moving to lost+found Phase 7 - verify and correct link counts... done |
In this example, inode 242000 was an inode that was moved to lost+found during a previous xfs_repair run. This run of xfs_repair found that the filesystem is consistent. If the lost+found directory had been empty, in phase 4 only the messages about clearing and deleting the lost+found directory would have appeared. The imap claims and disconnected inode messages appear (one pair of messages per inode) if there are inodes in the lost+found directory.
If xfs_repair fails to repair the filesystem successfully, try giving the same xfs_repair command twice more; xfs_repair may be able to make more repairs on successive runs. If xfs_repair fails to fix the consistency problems in three tries, your next step depends upon where it failed:
If xfs_repair failed in phase 1, you must restore lost files from backups.
If xfs_repair failed in phase 2 or later, you may be able to restore files from the disk by backing up and restoring the files on the filesystem.
If xfs_repair failed in phase 2 or later, follow these steps:
Mount the filesystem read-only using mount -r.
Make a filesystem backup with xfsdump.
Use mkfs.xfs to a make new filesystem on the same disk partition or logical volume.
Restore the files from the backup with xfsrestore.
See Chapter 6, “Backup and Recovery Procedures” for information about xfsdump and xfsrestore.
If a filesystem is damaged to the extent that you are unable to mount the filesystem successfully in the standard fashion, you may be able to recover some of its data by mounting the filesystem with the -o norecover option of the mount command. This option mounts the filesystem without running log recovery. You must mount the filesystem as read-only when you use this option.