Chapter 2. System Startup (2.202)

Revision: $Revision: 1.13 $ ($Date: 2004/08/13 08:14:21 $)

This topic has a total weight of 5 points and contains the following 2 objectives:

Objective 2.202.1 Customizing system startup and boot processes (2 points)

This topic includes being able to edit the appropriate system startup scripts to customize standard system run levels and boot processes, interacting with run levels and creating custom initrd images as needed.

Objective 2.202.2 System recovery (3 points)

This topic includes being able to properly configure and navigate the standard Linux filesystem, configuring and mounting various filesystem types, and manipulating filesystems to adjust for disk space requirements or device additions.

Customizing system startup and boot processes (2.202.1)

This topic includes being able to edit the appropriate system startup scripts to customize standard system run levels and boot processes, interacting with run levels and creating custom initrd images as needed.

Key files, terms and utilities include:

/etc/init.d/
/etc/inittab
/etc/rc.d
mkinitrd Described in the section called “mkinitrd”

The Linux Boot process

A description of the boot process can also be found at Vanderbilt University.

The Linux boot process can be logically divided into six parts. They are as follows:

  1. Kernel loader loading, setup, and execution (bootsect.s)

    In this step the file bootsect.s is loaded into memory by the BIOS. bootsect.s then sets up a few parameters and loads the rest of the kernel into memory.

  2. Parameter setup and switch to 32-bit mode (boot.s)

    After the kernel has been loaded, boot.s takes over. It sets up a temporary IDT and GDT (explained later on) and handles the switch to 32-bit mode.

    Detailed information on IDT, GDT and LDT can be found on sandpile.org - The world's leading source for pure technical x86 processor information.

  3. Kernel decompression (compressed/head.s)

    The kernel is stored in a compressed format. This head.s (since there is another head.s) decompresses the kernel.

  4. Kernel setup (head.s)

    After the kernel is decompressed, head.s (the second one) takes over. The real GDT and IDT are created, as is a basic memory-paging table.

  5. Kernel and memory initialization (main.c)

    This step is the most complex. The kernel now has control and sets up all remaining parameters and initializes everything remaining. Virtual memory is setup completely and the first processes are created.

  6. Init process creation (main.c)

    In the final step of booting, the Init process is created.

Kernel Loader (linux/arch/i386/boot/bootsect.s)

When the computer is first turned on, BIOS loads the boot sector of the boot disk into memory at location 0x7C00. This first sector corresponds to the bootsect.s file. The BIOS will only copy 512 bytes, so the kernel loader must be small. The code that is loaded by the BIOS must be able to load the remaining portions of the operating system and pass control onto the next file.

The first thing that bootsect.s does when it is loaded is to move itself to the memory location 0x9000. This is to avoid any possible conflicts in memory. The code then jumps to the new copy located at 0x9000. After this, an area in memory is set aside (0x4000-12) for a new disk parameter table. To make it so that more than one sector can be read from the disk at a time, we will try to find the largest number of sectors that can be read at a time. This will help speed reads from the disk when we begin loading the rest of the kernel.

Before this is done, setup.s is loaded into memory in the memory space above bootsect.s, 0x9020. This allows setup.s to be jumped to after the kernel has been loaded. Now the disk parameter table is created. Basically, the code tries to read 36 sectors, if that fails it tries 18, 15, then if all else fails it uses 9 as the default.

If at any point there is an error, little can be done. In most cases, bootsect.s will just keep trying to do what it was doing when the error occurred. Usually this will end in an unbroken loop that can only be resolved by rebooting by hand.

At last we are ready to copy the kernel into memory. bootsect.s goes into a loop that reads the first 508Kb from the disk and places it into memory starting at 0x10000. After the kernel is loaded into RAM, bootsect.s jumps to 0x9020, where setup.s is loaded.

Parameter Setup (linux/arch/i386/boot/setup.s)

setup.s makes sure that all hardware information has been collected and gathers more information if necessary. It first verifies that it is loaded at 0x9020. After this is verified, setup.s does the following:

  1. Gets main memory size

  2. Sets keyboard repeat rate to the maximum

  3. Retrieves video card information

  4. Collects monitor information for the terminal to use

  5. Gets information about the first and possibly second hard drive using BIOS

  6. Looks to see if there is a mouse (a pointing device) attached to the system

All of the information that setup.s collects is stored for later use by device drivers and other areas of the system. Like bootsect.s, if an error occurs little can be done. Most errors are “handled” by an infinite loop that has to be reset manually.

The next step in the booting process needs to use virtual memory. This can only be used on a x86 by switching from real mode to protected mode. After all information has been gathered by setup.s, it does a few more housekeeping chores to get ready for the switch to 32-bit mode.

First, all interrupts are disabled. Once the system is in 32-bit mode, no more BIOS calls can be made. The area of memory at 0x1000 is where the BIOS handlers were loaded when the system came up. We no longer need these, so to get the compressed kernel out of the way, setup.s moves the kernel from 0x10000 to 0x1000. This provides room for a temporary IDT (Interrupt Descriptor Table) and GDT (Global Descriptor Table). The GDT is only setup to have the system in memory. All paging is disabled, so that described memory locations correspond to actual memory addresses. At this point, extended (or high) memory is enabled.

Setup also resets any present coprocessor and reconfigures the 8259 Programmable Interrupt Controller. All that remains now is for the protected bit mask to be set, and the processor is in 32-bit mode. After the switch has been made, setup.s lets processing continue at /compressed/head.s to uncompress the kernel.

Kernel Decompression (linux/arch/i386/boot/compressed/head.s )

This first head.s uncompresses the kernel into memory. The kernel is gzip-compressed to make sure that it can fit into the 508Kb that bootsect.s will load. When the kernel is compiled, bootsect.s, head.s , and /compressed/head.s are not compressed and are appended to the front of the compressed kernel. They are the only three files that must remain uncompressed.

head.s decompresses the kernel to address 0x1000000. This corresponds to the 1Mb boundary in memory. head.s does a bit of error checking before it decompresses the kernel to ensure that there is enough memory available in high memory.

Right before the decompression is done, the flags register is reset and the area in memory where setup.s was is cleared. This is to put the system in a better known state. After the decompression, control is passed to the now decompressed head.s.

Kernel Setup (linux/arch/i386/kernel/head.s)

The second head.s is responsible for setting up the permanent IDT and GDT, as well as a basic paging table. Before anything is done, the flags register is again reset. The first page in the paging system is setup at 0x5000. This page is filled by the information gathered in setup.s by copying it from its location at 0x9000.

Next the processor type is determined. For 586s (Pentium) and higher there is a processor command that returns the type of processor. Unfortunately, the 386 and 486 do not have this feature so some tricks have to be employed. Each processor has only certain flags, so by trying to read and write to them you can determine the type of processor. If a coprocessor is present that is also detected.

After that, the IDT and GDT are set up. The table for the IDT is set up. Each interrupt gets an 8-byte descriptor. Each descriptor is initially set to ignore_int. This means that nothing will happen when the interrupt is called. All that ignore_int does is, is save the registers, print “unknown interrupts”, and then restore the registers.

Each IDT descriptor is divided into four two-byte sections. The top four bytes are called the WW, while the bottom four are the CW. The WW contains a two-byte offset, a P-flag set to 1, and a Descriptor Privilege Level. The CW has a selector and an offset. In total the IDT can contain up to 256 entries.

At this point the code sets up memory paging. In the x86 architecture, virtual memory uses three descriptors to establish an address: a Page Directory, a Page Table, and a Page Frame. The Page Directory is a table of all of the pages and what processes they match to. The Page Directory contains an index into the Page Table. The Page Table maps the virtual address to the beginning of a physical page in memory. The Page Frame and an offset use the beginning address of the physical page and can retrieve an actual location in memory. The three structures are setup by head.s. They make it so that the first 4Mb of memory is in the Page Directory. The kernel's virtual address is set to 0xC0000000, or the top of the last gigabyte of memory.

Each memory address in an x86 has three parts. The first is the index into the Page Directory. The result of this index is the start of a specific Page Table. The second part of the 32-bit address is an offset into the Page Table. The Page Table has a 32-bit entry that corresponds to that offset. The top 20 bits are used to get an actual physical address. The lower 12 bits are used for administrative purposes. The physical address corresponds to the start of a physical page. The third part of the 32-bit address is an offset within this page, equal to a real memory location.

Almost everything is set up at this point. Now control is passed to the main function in the kernel. Main.c gains control.

Kernel Initialization (linux/init/main.c)

All remaining setup and initialization functions are called from main.c. Paging, the IDT and most of the other systems have been initialized by now. Main.c will make sure everything is in its proper place before it tries to start some processes and give control to init.c.

A call to the function start_kernel() is made. In essence all that start_kernel() does is run through a list of init functions that needed to be called. Such things as paging, traps, IRQs, process schedules and more are setup. The important work has already been done for memory and interrupts. Now all that has to be done is to have all of the tables filled in.

Init process creation (linux/init/main.c)

After all of the init functions have been called main.c tries to start the init process. main.c tries three different copies of init in order. If the first doesn't work, it tries the second, if that one doesn't work it goes to the third. Here are the file names for the three init versions:

  1. /etc/init

  2. /bin/init

  3. /sbin/init

If none of these three inits work, then the system goes into single user mode. init is needed to log in multiple users and to manage many other tasks. If it fails, then the single user mode creates a shell and the system goes from there.

What happens next, what does /sbin/init do?

init is the parent of all processes, it reads the file /etc/inittab and creates processes based on its contents. One of the things it usually does is spawn gettys so that users can log in. It also defines so called “runlevels”.

A “runlevel” is a software configuration of the system which allows only a selected group of processes to exist.

init can be in one of the following eight runlevels

runlevel 0 (reserved)

Runlevel 0 is used to halt the system.

runlevel 1 (reserved)

Runlevel 1 is used to get the system in single user mode.

runlevel 2-5

Runlevels 2,3,4 and 5 are multi-user runlevels.

runlevel 6

Runlevel 6 is used to reboot the system.

runlevel 7-9

Runlevels 7, 8 and 9 are also valid. Most of the Unix variants don't use these runlevels. On a particular Debian Linux System for instance, the /etc/rc<runlevel>.d directories, which we'll discuss later, are not implemented for these runlevels, but they could be.

runlevel s or S

Runlevels s and S are internally the same runlevel S which brings the system in “single-user mode”. The scripts in the /etc/rcS.d directory are executed when booting the system. Although runlevel S is not meant to be activated by the user, it can be.

runlevels A, B and C

Runlevels A, B and C are so called “on demand” runlevels. If the current runlevel is “2” for instance, and an init A command is executed, the things to do for runlevel “A” are done but the actual runlevel remains “2”.

Configuring /etc/inittab

As mentioned earlier, init reads the file /etc/inittab to determine what it should do. An entry in this file has the following format:

id:runlevels:action:process

Included below is an example /etc/inittab file.

# The default runlevel.
id:2:initdefault:

# Boot-time system configuration/initialization script.
# This is run first except when booting in emergency (-b) mode.
si::sysinit:/etc/init.d/rcS

# What to do in single-user mode.
~~:S:wait:/sbin/sulogin

# /etc/init.d executes the S and K scripts upon change
# of runlevel.
#
# Runlevel 0 is halt.
# Runlevel 1 is single-user.
# Runlevels 2-5 are multi-user.
# Runlevel 6 is reboot.

l0:0:wait:/etc/init.d/rc 0
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
l3:3:wait:/etc/init.d/rc 3
l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/rc 6
# Normally not reached, but fall through in case of emergency.
z6:6:respawn:/sbin/sulogin

# /sbin/getty invocations for the runlevels.
#
# The "id" field MUST be the same as the last
# characters of the device (after "tty").
#
# Format:
#  <id>:<runlevels>:<action>:<process>
1:2345:respawn:/sbin/getty 38400 tty1
2:23:respawn:/sbin/getty 38400 tty2
          

Description of an entry in /etc/inittab:

id

The id-field uniquely identifies an entry in the file /etc/inittab and can be 1-4 characters in length. For gettys and other login processes however, the id field should contain the suffix of the corresponding tty, otherwise the login accounting might not work.

runlevels

This field contains the runlevels for which the specified action should be taken.

action

The “action” field can have one of the following values:

respawn

The process will be restarted whenever it terminates, (e.g. getty).

wait

The process will be started once when the specified runlevel is entered and init will wait for its termination.

once

The process will be executed once when the specified runlevel is entered.

boot

The process will be executed during system boot. The runlevels field is ignored.

bootwait

The process will be executed during system boot, while init waits for its termination (e.g. /etc/rc). The runlevels field is ignored.

off

This does absolutely nothing.

ondemand

A process marked with an on demand runlevel will be executed whenever the specified ondemand runlevel is called. However, no runlevel change will occur (on demand runlevels are “a”, “b”, and “c”).

initdefault

An initdefault entry specifies the runlevel which should be entered after system boot. If none exists, init will ask for a runlevel on the console. The process field is ignored. In the example above, the system will go to runlevel 2 after boot.

sysinit

The process will be executed during system boot. It will be executed before any boot or bootwait entries. The runlevels field is ignored.

powerwait

The process will be executed when the power goes down. init is usually informed about this by a process talking to a UPS connected to the computer. init will wait for the process to finish before continuing.

powerfail

As for powerwait, except that init does not wait for the process's completion.

powerokwait

This process will be executed as soon as init is informed that the power has been restored.

powerfailnow

This process will be executed when init is told that the battery of the external UPS is almost empty and the power is failing (provided that the external UPS and the monitoring process are able to detect this condition).

ctrlaltdel

The process will be executed when init receives the SIGINT signal. This means that someone on the system console has pressed the CTRL-ALT-DEL key combination. Typically one wants to execute some sort of shutdown either to get into single-user level or to reboot the machine.

kbdrequest

The process will be executed when init receives a signal from the keyboard handler that a special key combination was pressed on the console keyboard. Basically you want to map some keyboard combination to the “KeyboardSignal” action. For example, to map Alt-Uparrow for this purpose use the following in your keymaps file: alt keycode 103 = KeyboardSignal.

process

This field specifies the process that should be executed. If the process field starts with a “+”, init will not do utmp and wtmp accounting. Some gettys insist on doing their own housekeeping. This is also a historic bug.

The /etc/init.d/rc script

For each of the runlevels 0-6 there is an entry in /etc/inittab that executes /etc/init.d/rc ? where “?” is 0-6, as you can see in following line from the earlier example above:

l2:2:wait:/etc/init.d/rc 2
          

So, what actually happens is that /etc/init.d/rc is called with the runlevel as a parameter.

The directory /etc contains several, runlevel specific, directories which in their turn contain runlevel specific symbolic links to scripts in /etc/init.d/. Those directories are:

$ ls -d /etc/rc*
/etc/rc.boot  /etc/rc1.d  /etc/rc3.d  /etc/rc5.d  /etc/rcS.d
/etc/rc0.d    /etc/rc2.d  /etc/rc4.d  /etc/rc6.d
          

As you can see, there also is a /etc/rc.boot directory. This directory is obsolete and has been replaced by the directory /etc/rcS.d. At boot time, the directory /etc/rcS.d is scanned first and then, for backwards compatibility, the /etc/rc.boot.

The name of the symbolic link either starts with an “S” or with a “K”. Let's examine the /etc/rc2.d directory:

$ ls /etc/rc2.d
K20gpm       S11pcmcia   S20logoutd  S20ssh      S89cron
S10ipchains  S12kerneld  S20lpd      S20xfs      S91apache
S10sysklogd  S14ppp      S20makedev  S22ntpdate  S99gdm
S11klogd     S20inetd    S20mysql    S89atd      S99rmnologin
          

If the name of the symbolic link starts with a “K”, the script is called with “stop” as a parameter to stop the process. This is the case for K20gpm, so the command becomes K20gpm stop. Let's find out what program or script is called:

$ ls -l /etc/rc2.d/K20gpm
lrwxrwxrwx 1 root root 13 Mar 23 2001 /etc/rc2.d/K20gpm -> ../init.d/gpm
          

So, K20gpm stop results in /etc/init.d/gpm stop. Let's see what happens with the “stop” parameter by examining part of the script:

#!/bin/sh
#
# Start Mouse event server
...
case "$1" in
  start)
     gpm_start
     ;;
  stop)
     gpm_stop
     ;;
  force-reload|restart)
     gpm_stop
     sleep 3
     gpm_start
     ;;
  *)
     echo "Usage: /etc/init.d/gpm {start|stop|restart|force-reload}"
     exit 1
esac
          

In the case..esac the first parameter, $1, is examined and in case its value is “stop”, gpm_stop is executed.

On the other hand, if the name of the symbolic link starts with an “S”, the script is called with “start” as a parameter to start the process.

The scripts are executed in a lexical sort order of the filenames.

Let's say we've got a daemon SomeDaemon, an accompanying script /etc/init.d/SDscript and we want SomeDaemon to be running when the system is in runlevel 2 but not when the system is in runlevel 3.

As you've read earlier, this means that we need a symbolic link, starting with an “S”, for runlevel 2 and a symbolic link, starting with a “K”, for runlevel 3. We've also determined that the daemon SomeDaemon is to be started after S19someotherdaemon which implicates S20 and K80 since starting/stopping is symmetrical, i.e. that what is started first is stopped last. This is accomplished with the following set of commands:

# cd /etc/rc2.d
# ln -s ../init.d/SDscript S20SomeDaemon
# cd /etc/rc3.d
# ln -s ../init.d/SDscript K80SomeDaemon
          

Should you wish to manually start, restart or stop a process, it is good practice to use the appropriate script in /etc/init.d/ , e.g. /etc/init.d/gpm restart to initiate the restart of the process.

Copyright Snow B.V. The Netherlands