Troubleshooting LILO (2.214.4)

Revision: $Revision: 1.7 $

Candidate should be able to: determine specific stage failures and corrective techniques.

Key files, terms and utilities include:

Know meaning of L, LI, LIL, LILO, and scrolling 010101 errors
Know the different LILO install locations, MBR, /dev/fd0, or primary/extended partition
/boot/boot.b
Know significance of /boot/boot.### files

Resources: the man pages for the various commands, Wirzenius98, Yap98.

Booting from CD-ROM and networks

As of this writing most BIOS's let you choose booting from hard disk, floppy, network or CDROM. To give an oversight these alternatives are outlined below. Since most systems boot from hard disk, this process was described in more detail and is elaborated on later on.

Booting from CDROM requires that your hardware support the El Torito standard. El Torito is a specification that says how a CDROM should be formatted such that you can directly boot from it. A bootable CDROM contains contains a floppy-disk image in its initial sectors. This image is treated like a floppy by the BIOS and booted from.

Booting from the network is done using the Boot Protocol (BOOTP) or the Dynamic Host Configuration Protocol (DHCP). DHCP actually is an evolution of BOOTP. In most cases the client has no means to address the bootserver directly, so the client broadcasts an UDP packet over the network. Any bootserver that has information about the client stored will answer. If more than one server responds, the client will select one of them. Since the requesting client does not yet have a valid IP address, the unique hardware (MAC) address of its network card is used to identify it to the BOOTP server(s) in your network. The BOOTP server(s) will issue the IP address, a hostname, the address of the server where the image of the kernel to boot can be found and the name of that image. The client configures its network accordingly and downloads the specified image from the server that was specified using the Trivial File Transfer Protocol (TFTP). TFTP is often considered to be an unsafe protocol, since there is no authentication. It uses the UDP protocol. However, its triviality also compact implementations that can be stored in a boot-ROM, for example a PC BIOS. After the kernel image has been retrieved, is will be started the usual way. Often, the root filesystem is located on another server too, and NFS is used to mount it. This requires a Linux kernel that allows the root filesystem to be NFS.

Booting from disk or partition

booting from floppy or disk is the common case. In previous chapters we already described the boot process used to boot from floppy. However, there is is a slight difference between floppy and hard disk boots. Both contain a bootsector, located at cylinder 0, head 0, sector 1. On a floppy the boot sector often contains just the boot code to be loaded in memory and executed.

booting from hard disk requires some additional functionality: a hard disk can contain one or more partitions, in which case the boot program needs to find out from which partition to boot. A partition in turn will contain its own bootcode sector. The sector located at cylinder 0, head 0, sector 1 is called the master boot record (MBR).

Information about hard disk partitions is typically stored in partition tables, which are data-structures stored on a special partition sector. There are various types of partition tables, for example IRIX/SGI, Sun or DOS. It depends on the hardware in use which type of partition table is used. In this book we focus on the classical PC (DOS) partition table, which is typical for PC hardware. By default Linux accepts and uses DOS partition tables. Support for other partition types can be enabled in the kernel. On PC hardware the partition table is part of the MBR. You can use the fdisk command to print out your current partition table or to create a new one.

On a PC the BIOS starts loading the first 446 bytes of cylinder 0, head 0, sector 1 into memory. These bytes comprise the boot program. That boot program is executed next. It is up to you which program to use to boot your system. By writing your own boot program you could continue the boot process any way you want. But there are many fine boot programs available, for example the DOS loader and the Linux Loader (LILO). Alternately, you can use another boot loader program, for example GRUB. Sometimes a Windows boot loader is used (i.e. Bootmagic), or even the old fashioned DOS boot loader.

DOS for example uses a loader programs that scans the partition table for a bootable partition. When an entry marked active was found the first sector of that partition is loaded into memory and executed. That code in turn continues the loading of the operating system.

Linux can install a loader program too. Often this will be LILO, the Linux Loader. LILO uses a two-stage approach: the boot sector has a boot program, that loads a boot file, the second stage boot program. That program presents you with a simple menu-like interface, which either prompts you for the operating system to load or optionally times out and loads the the default system. Note, that the code in the MBR is limited: it does not have any knowledge about concepts like filesystems let alone filenames. It can access the hard disk, but needs the BIOS to do so. And the BIOS is not capable of understanding anything but CHS (Cylinder/Heads/Sectors). Hence, to find its boot program, the code in the MBR needs exact specification of the CHS to use to find it. These specifications are figured out by /sbin/lilo, when it installs the boot sector.

The second stage LILO boot program needs information about the following items:

  • where /boot/boot.b can be found; it contains the second stage boot program. The second stage program will be loaded by the initial boot program in the MBR;

  • the /boot/map file, which contains information about the location of kernels, boot sectors etc.; this information is used mostly by the second stage boot program; see below for a more detailed description of the map file;

  • the location of kernel(s) you want to be able to boot

  • the boot sectors of all operating systems it boots

  • the location of the startup message, if one has been defined

Remember, to be able to access these files, the BIOS needs the CHS (Cylinder/Head/Sector) information to load the proper block. This also holds true for the code in the second stage loader. LILO therefore needs a so called map file, that maps filenames into CHS values. This file contains information for all files that LILO needs to know of during boot, for example locations of the kernel(s), the command line to execute on boot, and more. The default name for the map file is /boot/map. /sbin/lilo uses the file /etc/lilo.conf to determine what files to map and what bootprogram to use and creates a mapfile accordingly.

More about partitions tables

The DOS partition table is embedded in the MBR at cylinder 0, head 0, sector 1, at offset 447 (0x1BF) and on. There are four entries in a DOS partition table. Only one of them can be marked as active: the boot program normally will load the first sector of the active partition in memory and deliver control to it.

An entry in the partition table contains 16 bytes, as shown in the following figure:

Figure 14.1. A (DOS) partition table entry

 
|boot? ||start                 ||type  ||partition             |
|      ||cyl      |head   |sect||      ||cyl      |head   |sect|
|------||--------||------||----||------||--------||------||----|


|start in LBA                  ||size in sectors               |
|------||------||------||------||------||------||------||------|

As you can see, each partition entry contains the start and end location of the partition specified as the Cylinder/Head/Sector of the hard disk. Note, that the Cylinder field has 10 bits, therefore the maximum number of sectors that can be specified is (2^10==) 1024. BIOSes traditionally use CHS specifications hence older BIOSes are not capable of accessing data stored beyond the first 1024 cylinders of the disk.

As disks grew in size the partition/disk sizes could not be properly expressed using the limited capacity of the CHS fields anymore. An alternate method of addressing blocks on a hard disk was introduced: Logical Block Addressing (LBA). LBA addressing specifies sections of the disk by their block number relative to 0. A block can be seen as a 512 byte sector. The last 64 bits in a partition table entry contain the begin and end of that partition specified as LBA address of the begin of the partition and the number of sectors.

Tip

Remember that your computer boots using the BIOS disk access routines. Hence, if your BIOS does not cope with LBA addressing you may not be able to boot from partitions beyond the 1024 cylinder boundary. For this reason people with large disks often create a small partition somewhere within the 1024 cylinder boundary, usually mounted on /boot and put the boot program and kernel in there, so BIOS can boot Linux from hard disk. Once loaded, Linux ignores the BIOS - it has its own disk access procedures which are capable of handling huge disks.

The type field contains the type of the partition, which usually relates to the purpose the partition was intended for. To give an impression of the various types of partitions available, a screen dump of the List command within fdisk follows:

 0  Empty           17  Hidden HPFS/NTF 5c  Priam Edisk     a6  OpenBSD
 1  FAT12           18  AST Windows swa 61  SpeedStor       a7  NeXTSTEP
 2  XENIX root      1b  Hidden Win95 FA 63  GNU HURD or Sys b7  BSDI fs
 3  XENIX usr       1c  Hidden Win95 FA 64  Novell Netware  b8  BSDI swap
 4  FAT16 <32M      1e  Hidden Win95 FA 65  Novell Netware  c1  DRDOS/sec (FAT-
 5  Extended        24  NEC DOS         70  DiskSecure Mult c4  DRDOS/sec (FAT-
 6  FAT16           3c  PartitionMagic  75  PC/IX           c6  DRDOS/sec (FAT-
 7  HPFS/NTFS       40  Venix 80286     80  Old Minix       c7  Syrinx
 8  AIX             41  PPC PReP Boot   81  Minix / old Lin db  CP/M / CTOS / .
 9  AIX bootable    42  SFS             82  Linux swap      e1  DOS access
 a  OS/2 Boot Manag 4d  QNX4.x          83  Linux           e3  DOS R/O
 b  Win95 FAT32     4e  QNX4.x 2nd part 84  OS/2 hidden C:  e4  SpeedStor
 c  Win95 FAT32 (LB 4f  QNX4.x 3rd part 85  Linux extended  eb  BeOS fs
 e  Win95 FAT16 (LB 50  OnTrack DM      86  NTFS volume set f1  SpeedStor
 f  Win95 Ext'd (LB 51  OnTrack DM6 Aux 87  NTFS volume set f4  SpeedStor
10  OPUS            52  CP/M            93  Amoeba          f2  DOS secondary
11  Hidden FAT12    53  OnTrack DM6 Aux 94  Amoeba BBT      fd  Linux raid auto
12  Compaq diagnost 54  OnTrackDM6      a0  IBM Thinkpad hi fe  LANstep
14  Hidden FAT16 <3 55  EZ-Drive        a5  BSD/386         ff  BBT
16  Hidden FAT16    56  Golden Bow

Extended partitions

The design limitation that imposes a maximum of four partitions proved to be troublesome as disks grew larger and larger. Therefore, a work-around was invented: by specifying one of the partitions as a DOS Extended partition it in effect becomes a container for more partitions aptly named logical partitions. The Extended partition can be regarded as a container, that holds one or more logical partitions. The total size of all logical partitions within the extended partition can never exceed the size of that extended partition.

In principle Linux lets you create as many logical partitions as you want, of course restricted by the physical boundaries of the extended partition and hardware limitations. The logical partitions are described in a linked list of sectors. The four primary partitions, present or not, get numbers 1-4. Logical partitions start numbering from 5. The main disk contains a partition table that describes the partitions, the extended partitions contain logical partitions that in turn contain a partition table that describes a logical partition and a pointer to the next logical partitions partition table, see the ASCII art below:

Figure 14.2. Partition table setup

+-------------------------------------------------+
| Partition table                       /dev/hda  |
| +-----------------------------------------------|
| | Partition 1                         /dev/hda1 |
| |                                               |
| |-----------------------------------------------|
| | Partition 2                         /dev/hda2 |
| |                                               |
| |-----------------------------------------------|
| | Extended partition                  /dev/hda3 |
| | +---------------------------------------------|
| | | Extended partition table                    |
| | |---------------------------------------------|
| | | Partition 3                       /dev/hda5 |
| | |                                             |
| | |---------------------------------------------|
| | | Extended partition table                    |
| | |---------------------------------------------|
| | | Partition 4                       /dev/hda6 |
| | |                                             |
| |-----------------------------------------------|
| | Partition 5                         /dev/hda4 |
| |                                               |
+-------------------------------------------------+

The LILO install locations

LILO's first stage loader program can either be put in the MBR, or it can be put in any partitions boot sector. Of course, you could put it in both locations if you wanted to, for example in the MBR to decide whether to boot Windows, DOS or Linux and if Linux is booted, its boot sector could contain LILO's primary loader too, which would for example enable you to choose between different versions/configurations of the kernel.

The tandem Linux and Windows is frequently used to ease the migration of services to the Linux platform or to enable both Linux and Windows to run on the same computer. To dual boot Linux and Windows 95/98, you can install LILO on the master boot record. Windows NT and Windows 2000 require their own loader in the MBR. In these case, you can install LILO in the Linux partition as a secondary boot loader. The initial boot will be done by the Windows loader in the MBR, which then can transfer control to LILO.

LILO backup files

/sbin/lilo can create the bootprogram in the MBR or in the first sectors of a partition. The bootprogram, sometimes referred to as the first stage loader will try to load the second stage boot loader. The seconds stage bootloader is contained in a file on the boot partition of your Linux system, by default it is in the file /boot/boot.b.

If you use /sbin/lilo to write the bootprogram it will try to make a backup copy of the old contents of the bootsector and will write the old contents in a file named /boot/boot.####. The hash symbols are actually replaced by the major and minor numbers of the device where the original bootsector used to be, for example, the backup copy of the MBR on the first IDE disk would be stored as /boot/boot.0300: 3 is the major number for the device file /dev/hda, and 0 is the minor number for it. /sbin/lilo will not overwrite an already existing backup file.

LILO errors

When LILO loads itself, it displays the word

LILO

Each letter is printed before or after performing some specific action. If LILO fails at some point, the letters printed so far can be used to identify the problem.

(nothing)

No part of LILO has been loaded. Either LILO isn't installed or the partition on which its boot sector is located isn't active.

L error

The first stage boot loader has been loaded and started, but it can't load the second stage boot loader. The two-digit error codes indicate the type of problem. This condition usually indicates a media failure or a geometry mismatch. The most frequent causes for a geometry mismatch are not physical defects or invalid partition tables but errors during the installation of LILO. Often these are caused by ignoring the 1024 cylinder boundary.

This error code signals a transient problem - in that case LILO will try to resume or halt the system. However, sometimes the error code is not transient and LILO will repeat it, over and over again. This means that you end up with a scrolling screen that contains just the error codes. For example: the error code 01 signifies an illegal command. This signifies that the disk type is not supported by your BIOS or that the geometry can not correctly be determined. Other error codes are described in full in the LILO's user documentation.

LI

The first stage boot loader was able to load the second stage boot loader, but has failed to execute it. This can either be caused by a geometry mismatch or by moving /boot/boot.b without running the map installer.

LIL

The second stage boot loader has been started, but it can't load the descriptor table from the map file. This is typically caused by a media failure or by a geometry mismatch.

LIL?

The second stage boot loader has been loaded at an incorrect address. This is typically caused by a subtle geometry mismatch or by moving /boot/boot.b without running the map installer.

LIL-

The descriptor table is corrupt. This can either be caused by a geometry mismatch or by moving /boot/map without running the map installer.

LILO

All parts of LILO have been successfully loaded.

Copyright Snow B.V. The Netherlands