System recovery (2.202.2)

This topic includes being able to properly configure and navigate the standard Linux filesystem, configuring and mounting various filesystem types, and manipulating filesystems to adjust for disk space requirements or device additions.

Key files, terms and utilities include:

LILO
init
inittab
mount
fsck

GRUB explained

The boot loader loads the operating system kernel and transfers control to it. There are two widely used boot loader programs for Linux: LILO and GRUB. LILO is described in more detail in the section called “Booting from disk or partition”. This section discusses GRUB.

GRUB understands filesystems and kernel executable formats, where LILO needs setting the proper disk access information (track/sector/byte). Loading the kernel can be done by specifying the drive, partition and filename instead of the track/sector/byte offset.

GRUB is able to boot many operating systems, both free and proprietary ones. Open operating systems, like FreeBSD, NetBSD, OpenBSD, and Linux, are supported by GRUB directly. Proprietary kernels (e.g. DOS, Windows and OS/2) are supported using GRUB's chain-loading function. Chain-loading implies that GRUB will be used to boot the system, and in turn will load and run the proprietary systems bootloader, wich then boots the operating system.

GRUB features both a menu interface and a command-line interface. The command-line interface allows you to execute commands to select a root device (root command), load a kernel from it (kernel command), if necessary load some additional kernel modules (module or modulenounzip command) and subsequently boot the kernel (boot command).

The menu interface offers a method for sequential execution of command line commands. While booting, both interfaces are available. On boot the menu is displayed, and you can simply choose one of the menu entries. Upon choosing such an entry a number of preconfigured commands will be executed. You can also gain access to the CLI interface and manually specify the various parameters.

GRUB additionally allows on the fly editing of the menu entries. The commands for the menu-entries are listed in the file /boot/grub/menu.lst. Because GRUB is capable of accessing the systems filename directly, any change in that file is reflected immediately. This differs from the method LILO, uses. When LILO users make changes in the bootmenu, these will only be available during boot after the lilo command has been run. One of the common problems is that running lilo after making changes is often forgotten. Another common problem with LILO) is that any error in the configuration file may render your system unbootable and you need a rescue boot disk to correct the problem. GRUB users simply choose the proper kernel (which can be found by GRUB itself, since it understands most common file systems), boot the system and correct the problem.

To install and emulate the bootloader, a GRUB shell is available. This shell emulates the boot loader and can be used to install the boot loader. It also comes in handy to inspect your current set up and modify it. To start it up (as root) simply type grub. In the following example we display the help screen:

# grub
grub > help
grub> help 
blocklist FILE                         boot
cat FILE                               chainloader [--force] FILE
color NORMAL [HIGHLIGHT]               configfile FILE
device DRIVE DEVICE                    displayapm
displaymem                             find FILENAME
geometry DRIVE [CYLINDER HEAD SECTOR [ halt [--no-apm]
help [--all] [PATTERN ...]             hide PARTITION
initrd FILE [ARG ...]                  kernel [--no-mem-option] [--type=TYPE]
makeactive                             map TO_DRIVE FROM_DRIVE
md5crypt                               module FILE [ARG ...]
modulenounzip FILE [ARG ...]           pager [FLAG]
partnew PART TYPE START LEN            parttype PART TYPE
quit                                   reboot
root [DEVICE [HDBIAS]]                 rootnoverify [DEVICE [HDBIAS]]
serial [--unit=UNIT] [--port=PORT] [-- setkey [TO_KEY FROM_KEY]
setup [--prefix=DIR] [--stage2=STAGE2_ terminal [--dumb] [--timeout=SECS] [--
testvbe MODE                           unhide PARTITION
uppermem KBYTES                        vbeprobe [MODE]

grub >_

We already discussed the root, kernel, module and modulenounzip commands briefly. GRUB has many commands to assist engineers with their work, for example the blocklist command, which can be used to find out on which disk blocks a file is stored, or the geometry command, which can be used to find out the disk geometry. You can create new (primary) partitions using the partnew command, load an initrd image using the initrd command, and many more. All options are described in the GRUB documentation. GRUB is part of the GNU software library and as such is documented using the info system. On most systems there is a limited man page available as well.

GRUB uses its own syntax to describe hard disks. Device names need to be enclosed in brackets, e.g

(fd0) 

denotes the floppy disk, and

(hd0,1)

denotes the first partition on the first hard disk. Disks and partitions are counted starting at zero, so the last example references the first disk and the second partition.

GRUB uses the computer BIOS to find out which hard drives are available. But it can not always figure out the relation between Linux device filenames and the BIOS drives. The special file /boot/grub/device.map can be created to map these, e.g.:

(fd0)  /dev/fd0
(hd0)  /dev/hda

Note that when you are using software RAID-0 (mirroring), you need to set up GRUB on both disks. Upon boot, the system will not be able to use the software RAID system yet, so booting can only be done from one disk. If you only set up GRUB on the first disk and that disk would be damaged, the system would not be able to boot.

The initial boot process is just the same as it is for LILO. Upon boot, the BIOS accesses the initial sector of the hard disk, the so-called MBR (Master Boot Record), loads the data found there in memory and transfers execution to it. If GRUB is used, the MBR contains a copy of the first stage of GRUB, which tries to load stage 2.

To be able to load stage 2, GRUB needs to have access to code to handle the filesystem(s). There are many filesystem types and the code to handle them will not fit within the 512 byte MBR, even less so since the MBR also contains the partitioning table. The GRUB parts that deal with filesystems are therefore stored in the so-called DOS compatibility region. That region consists of sectors on the same cylinder where the MBR resides (cylinder 0). In the old days, when disks were adressed using the CHS (Cylinder/Head/Sector) specification, the MBR typically would load DOS. DOS requires that its image is on the same cylinder. Therefore, by tradition, the first cylinder on a disk is reserved and it is this space that GRUB uses to store the filesystem code. That section is referred to as stage 1.5.

In Linux, the grub-install command is used to install stage 1 to either the MBR or within a partition. GRUB's configuration file, by default named stage2 and other files must be in a usable partition. If the files become unavailable stage 1 will present the end user with a command line interface.

Stage 2 contains most of the boot-logic. It presents a (graphical) menu to the end-user and an additional command prompt, where the user can manually specify boot-parameters. GRUB is typically configured to automatically load a particular kernel after a timeout period. Once the end-user made his/her selection, GRUB loads the selected kernel into memory and passes control on to the kernel. At this stage GRUB can pass control of the boot process to another loader using chain loading if required by the operating system.

Influencing the regular boot process

The regular boot process is the process that normally takes place when (re)booting the system. This process can be influenced by entering something at the LILO prompt. What can be influenced will be discussed in the following sections, but first we must activate the prompt. The LILO prompt is activated if LILO sees that one of the Shift, Ctrl or Alt keys is pressed, or CapsLock or ScrollLock is set after LILO is loaded.

Choosing another kernel

If you've just compiled a new kernel and you're experiencing difficulties with the new kernel, chances are that you'd like to revert to the old kernel. Of course you've kept the old kernel and added a label to lilo.conf which will show up if you press Tab or ? on the LILO prompt. Choose the old kernel, then solve the problems with the new one.

Booting into single user mode or a specific runlevel

This can be useful if, for instance, you've installed a graphical environment which isn't functioning properly. You either don't see anything at all or the system doesn't reach a finite state because is keeps trying to start X over and over again.

Booting into single user mode or into another runlevel where the graphical environment isn't running will give you access to the system so you can correct the problem. To boot into single user mode type the name of the label corresponding to the kernel you'd like started followed by an “s”, “S” or the word “single”. If the label is “Linux”, you can type one of the following after the LILO prompt:

LILO: Linux s
LILO: Linux S
LILO: Linux single
          

If you have defined a runlevel, let's say runlevel 4, which is a text-only runlevel, you can type the following line to boot into runlevel 4 and regain access to your system:

LILO: Linux 4
          

Passing parameters to the kernel

If a device doesn't work

A possible cause can be that the device driver in the kernel has to be told to use another irq and/or another I/O port. BTW: This is only applicable if support for the device has been compiled into the kernel, not if you're using a loadable module.

As an example, let's pretend we've got a system with two identical ethernet-cards for which support is compiled into the kernel. By default only one card will be detected, so we need to tell the driver in the kernel to probe for both cards. Suppose the first card is to become eth0 with an address of 0x300 and an irq of 5 and the second card is to become eth1 with an irq of 11 and an address of 0x340. This is done at the LILO bootprompt as shown below:

LILO: Linux ether=5,0x300,eth0 ether=11,0x340,eth1
						

Be careful only to include white space between the two “ether=” parameters.

If you've lost the root password

This never happens because, being a well-trained system administrator, you've written the password down on a piece of paper, put it in an envelope and placed it in the vault.

In case you haven't done this, shame on you!

The trick is to try to get a shell with root-privileges but without having to type the root password. As soon as that's accomplished, you can remove the root password by clearing root's second field in the file /etc/passwd (the fields are separated by colons), do a reboot, login as root and use passwd to set the new root password.

First reboot the system and type the following at the LILO boot prompt:

LILO: Linux init=/bin/bash
						

This will give you the shell but it's possible that the default editor vi can't be found because it's located on a filesystem that is not mounted yet (which would be the case on my system because vi on my system is located in /usr/bin which is not mounted). It's possible to do a mount -a now which will mount all the filesystems mentioned in /etc/fstab, except the ones with the noauto keyword, but you won't see that the filesystems have been mounted when you type mount after that because mount and umount keep a list of mounted filesystems in the file /etc/mtab which couldn't be updated because the root (/) filesystem under which /etc/mtab resides is mounted read-only.

Keeping all this in mind, the steps to follow after we've acquired the shell are:

# mount -o remount,rw /   'remount / readable and writable
# mount -a                'mount all
# mount                   'show mounted filesystems
# vi /etc/passwd          'and clear the second field for root
# sync                    'write buffers to disk
# umount -a               'unmount filesystems
# mount -o remount,ro /   'remount / read-only again
<Ctrl><Alt><Del>
login: root               'login as root without password
# passwd                  'type the new password
						

Ahem, now is a good time to write the root password ...

The Rescue Boot process

When fsck is started but fails

During boot, on my Debian system, this is done by /etc/rcS.d/S30check.fs. All filesystems are checked based on the contents of /etc/fstab.

If the command fsck returns an exit status larger than 1, the command has failed. The exit status is the result of one or more of the following conditions:

0    - No errors
1    - File system errors corrected
2    - System should be rebooted
4    - File system errors left uncorrected
8    - Operational error
16   - Usage or syntax error
128  - Shared library error
					

If the command has failed you'll get a message:

fsck failed. Please repair manually

"CONTROL-D" will exit from this shell and
continue system startup.
					

If you don't press Ctrl-D but enter the root password, you'll get a shell, in fact /sbin/sulogin is launched, and you should be able to run fsck and fix the problem if the root filesystem is mounted read-only.

Alternatively, as is described in the next section, you can boot from a home-made disc or from the distribution boot media.

If your root (/) filesystem is corrupt

Using a home-made bootfloppy

The default root filesystem is compiled into the kernel. This means that if a kernel has been built on a system that uses /dev/hda2 as the root filesystem and you've put the kernel on a floppy and boot from floppy, the kernel will still try to mount /dev/hda2 as the root filesystem.

The only way to circumvent this is to tell the kernel to mount another device as the root filesystem. Let's say we've cooked up a floppy with a root filesystem on it and a kernel that is an exact copy of our current kernel. All we have to do is boot from floppy and tell the kernel to use the root filesystem from the floppy. This is done at the LILO prompt:

LILO: Linux root=/dev/fd0
					  

Using the distribution's bootmedia

A lot of distributions come with two floppy disks and one or more CD's. One of the floppy disks is called the “bootdisk” or the “rescuedisk” the other is called the “rootdisk”.

You can boot from the “bootdisk” and the system will ask you to enter the “rootdisk”. After both disks have been processed, you've got a running system with the root filesystem (/) in a RAMdisk.

In case you're wondering why we didn't boot from CD; we could have, if the system supports booting from CD. Remember to set the boot-order in the BIOS to try to boot from CD-ROM first and then HDD.

As soon as we've booted from the disks or from CD we can get a shell with root-privileges but without having to type the root password by pressing Alt-F2. It is now possible to manually run fsck on the umounted filesystems.

Let's assume your root partition was /dev/hda2. You can then run a filesystem check on the root filesystem by typing fsck -y /dev/hda2. The “-y” flag prevents fsck from asking questions which you must answer (this can result in a lot of Enters) and causes fsck to use “yes” as an answer to all questions.

Although your current root (/) filesystem is completely in RAM, you can mount a filesystem from harddisk on an existing mountpoint in RAM, such as /target or you can create a directory first and then mount a harddisk partition there.

After you've corrected the errors, don't forget to umount the filesystems you've mounted before you reboot the system, otherwise you'll get a message during boot that one or more filesystems have not been cleanly umounted and fsck will try to fix it again.

Copyright Snow B.V. The Netherlands