Restoring GRUB on a dual-boot Windows/Linux system

GRUB (GRand Unified Bootloader):

To boot (start up) any personal computer, the hardware has to load the computer operating system from some device that has the operating system software stored on it. The typical IBM PC clone uses the BIOS (a program stored in a rewriteable CMOS memory chip) to read a small part of a hard disk drive (the bootblock), which contains enough space to hold a (bootloader) program that will load the operating system from the rest of the disk drive. The BIOS loads this bootloader program, and then that bootloader begins the process of loading the operating system (perhaps directly, or perhaps loading up another intermediary stage, with more options).

The reason for the multi-stage approach, is that the initial BIOS program has to be small, and doesn't have much space to include a plethora of features. Likewise for the bootloader program stored in the bootblock of a hard drive. However, it's able to use the rest of the drive space to hold more elaborate programs, allowing you change the way the system boots, in a variety of ways; and because it's a program on the disk drive, it can be updated, as needed, to suit the changes in disk drive technology.

The average PC is running a Microsoft operating system, and their bootloader is very limited (deliberately, to some degree), and (usually) only one operating system is supported (theirs). To boot-up an alternative system, usually requires using a different bootloader (one that gives you a menu of operating system choices). GRUB, is one such bootloader. It supports the use of a menu of boot options, can automatically pick one (after a timeout, or failure to load the first choice), and can also require a password before continuing to boot the system.

How GRUB is used:

The computer uses the BIOS to load a bootloader (GRUB) from a drive.
Now, that bootloader (GRUB) takes over.
GRUB reads its configuration file, and presents you with a menu of devices to boot from, and/or makes a decision for you (boots a default drive, straight away, or after a delay).
Whatever the result, GRUB boots the device, and GRUB exits.

The beginning of a nightmare:

I recently converted a PC from just using the Windows 98 Second Edition operating system, to a Windows / Red Hat 9.0 Linux dual boot system (adding a new hard drive for Linux).

Unfortunately, Windows kept on corrupting another FAT32 drive on the system (it had three hard drives; first the Windows boot drive, then the new Linux drive, and a spare FAT32 partitioned data drive), so I re-arranged the order of the drives (swapped the last two drives around), and had to FDISK /MBR the boot drive to get Windows to boot again. Now, I could get Windows to boot up, and it stopped corrupting the spare FAT32 data drive, but in rewriting a Windows bootblock, I had removed the information that GRUB had stored in the master boot record. Now I had to use a separate boot (floppy) disk, to boot Linux instead of Windows.

Trying to restore GRUB was such an exercise in frustration that I put up with using a boot floppy for Linux for quite some time. The guides didn't cover my permutation of booting from a Windows drive with the GRUB loader, to a separate Linux drive, and didn't give me enough information to easily figure out how to do that. I finally managed to restore things, and this document details what I've done.

What I wanted to do, is re-install GRUB into the master boot record of the first hard drive, so that every time I boot the PC, in a normal fashion, I'm presented with the GRUB boot menu. The rest of GRUB was still installed into the /boot partition (the first one), on the Linux hard drive. I still had that part of GRUB on the Linux drive, although I had to edit the grub.conf file to point to the right partitions, since the drives were re-arranged. It was just the part on the Windows drive's master boot record that got lost (the GRUB booter, and whatever data tells it where to find the rest of itself).

My current disk drives:

Connector	Drive	Linux device name	GRUB device name	Comment
Floppy	Floppy	/dev/fd0	(fd0)	GRUB sees this drive as the first floppy disk drive.
1st IDE	Zip100 (master) drive	/dev/hda4	?	GRUB might see this device as (fd1).
2nd IDE	CD-ROM burner (master) drive	/dev/scd0		/dev/scd0 is a link to /dev/hdc
2nd IDE	DVD-ROM (slave) drive	/dev/hdd
3rd IDE	Hard disk (master) drive	/dev/hde	(hd0)	Windows boot drive.
3rd IDE	Hard disk (slave) drive	/dev/hdf	(hd1)	FAT32 data drive.
4th IDE	Hard disk (single) drive	/dev/hdg	(hd2)	Linux drive.

GRUB seems to name hard drives (hdx) devices, based on whether it's a potentially bootable hard drive, skipping devices that aren't, the exception being floppy disk drives (it doesn't appear to give you a way to boot from CD-ROMs). The drive numbering begins from zero (the first drive, is drive zero).

When two hard drives are connected to the same port, one of them acts as the “master” drive, the other as the “slave” (chosen by jumpers on the drive; or their position on the cable, when using a “cable-select” system—involving a special cable, and jumpering the drives to the cable-select mode). Either way, the system talks to the master drive, and it controls the slave drive, the system doesn't directly control the slave drive (which is part of the reason why certain disk drives don't work in a master/slave combination).

My Linux drive's partitions:

Partition	Mount point	Linux device name	GRUB device name	Comment
1st	/boot	/dev/hdg1	(hd2,0)	System boot files (kernel, etc.), plus GRUB boot and configuration files.
2nd	/var	/dev/hdg2	(hd2,1)	Files that vary a lot (e.g. logs and mail spools).
3rd	/	/dev/hdg3	(hd2,2)	System root directory.
4th	not used	/dev/hdg4	(hd2,3)	Extended partion, containing the following partitions:
5th	swap	/dev/hdg5	(hd2,4)	Used for virtual memory.
6th	/usr	/dev/hdg6	(hd2,5)	Non-system, non-user files (e.g. applications).
7th	/home	/dev/hdg7	(hd2,6)	User home space.

I don't know why Linux installed itself with its mount points in that order. That was how Disk Druid re-arranged things, as I added each partition. They're not in the order that I entered each partition.

The first three partitions are “primary” partitions; after that, other partitions are put inside an “extended” partition. The extended partition is partition number four, and the following ones (inside it), start counting up from number five. I don't know why Linux is following this MS-DOS/Windows scheme of extended partitions, as it doesn't have to (drives can be partitioned differently, and MS-DOS/Windows can't read these Linux partitions, anyway). It's not as if the BIOS has to be able to recognise all the partitions, it just needs to access the ones necessary to begin booting.

I know that it's related to how PC clones BIOSs read IDE drives. But I don't know what they'd do if you tried having more than four primary partitions (whether it'd just fail, or only find the first four, or something else). Other computer systems don't have this limitation, I've used them, it's nothing to do with IDE drives, themselves, no matter what anybody thinks. For example, my old Amiga let me set many more partitions—I had six on one IDE drive in my Amiga 1200 computer, and there was no messy primary/extended partitioning scheme. It's a limitation of the hardware/software in IBM PC clones, and a limitation in any other system that can't handle this, if there are any (I haven't looked into how Macs do their partitioning for a long time). There's no reason why a non-Windows-centric computer system would have to have the same shortcomings.

My /etc/fstab file:

# Usual system mount points:
LABEL=/      /                 ext3         defaults         1 1
LABEL=/boot  /boot             ext3         defaults         1 2
none         /dev/pts          devpts       gid=5,mode=620   0 0
LABEL=/home  /home             ext3         defaults         1 2
none         /proc             proc         defaults         0 0
none         /dev/shm          tmpfs        defaults         0 0
LABEL=/usr   /usr              ext3         defaults         1 2
LABEL=/var   /var              ext3         defaults         1 2
/dev/hdg5    swap              swap         defaults         0 0

# My extra ones:
/dev/hda4    /mnt/zip          auto         noauto,owner     0 0
/dev/cdrom   /mnt/dvdrom       udf,iso9660  noauto,owner,ro  0 0
/dev/cdrom1  /mnt/cdburner     udf,iso9660  noauto,owner,ro  0 0
/dev/hde1    /mnt/windowsboot  vfat         noauto,user
/dev/hdf1    /mnt/fatdata      vfat         noauto

The Linux system partitions have “labels,” but I could have used the device names (hdg1, etc.), instead. That was how the system automatically set itself up. Using labels means that the system finds the right partitions, even if you relocate them (e.g. plug the drive into a different port), however it doesn't seem to support doing that for the swap partition, so that'll need changing by hand, if the drive is shifted (do this before shifting a drive, as it can difficult to boot Linux without a swap partition). And you'll suffer problems if you connect two drives that have partitions with the same label names, to get around that you'd ignore the labels and directly refer to the device names.

The first “auto” parameters (after the LS120 drive), are to auto-detect whatever filesystem has been used on those disks.

The “noauto” parameters stop the system from automatically mounting the device when it boots up. Users have to deliberately mount them.

The “owner” parameters means that whoever mounts a device, owns it (and its files), until they unmount it.

The “user” parameter means that any user may mount the device (it doesn't have to be mounted by the root user), although the same user has to be the one to unmount it. If I'd used “users”, instead, any user could unmount a device, even one that someone else had mounted.

The “/dev/cdrom” and “/dev/cdrom1” devices are actually links to “/dev/hdc” and “/dev/hdd” (some things that work with CDs or DVDs seem to expect device names with certain letters in them, and I mount them to mountpoints with names that make sense to me).

My /boot/grub/grub.conf file:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
#
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd1,0)
#          kernel /vmlinuz-version ro root=/dev/hdg3
#          initrd /initrd-version.img
#
#boot=/dev/hdg

default=0
timeout=15

splashimage=(hd2,0)/grub/splash.xpm.gz
 
title Red Hat Linux (2.4.20-19.9)
        root (hd2,0)
        kernel /vmlinuz-2.4.20-19.9 ro root=LABEL=/ hdc=ide-scsi
        initrd /initrd-2.4.20-19.9.img
 
title LemonOS (Win98SE)
        rootnoverify (hd0,0)
        chainloader +1
 
title Floppy
        rootnoverify (fd0)
        chainloader +1
 
title Reboot
        reboot
 
title Halt
        halt

I've added the “reboot” and “halt” (shutdown and power off), to work around problems caused by experimentation, and doing accidental reboots instead of shutdowns.

Other booting choices can be added, merely by editing this configuration file.

I'm not sure whether the “#boot=/dev/hdg” line is merely a note for you, or whether GRUB uses it.

Installing GRUB in a bootblock:

The GRUB manual talks about installing GRUB in a three step process; setting a root drive, locating the files used by GRUB, and installing GRUB to a bootblock (running setup). I had trouble working out whether setting the root drive meant the drive the BIOS tries to boot (it didn't), or the “boot” partition/directory (it did), or the system “root” (it didn't). It refers to the root drive used for the boot-up process by GRUB, which is only for the boot-up process; after the boot-up has finished, and the system is running, the system has it's own root directory. It couldn't find the “stage1” file it wanted for that process, according to the steps in the guide, but did if I replaced the mount point with the device name for the beginning of the file path (used (hd2,0), instead of /boot). And, I wasn't sure if you “setup” the drive where GRUB gets put into its bootblock, or where GRUB has its boot location.

Here's how I re-installed GRUB:

Opened a terminal as the root user.
Entered grub (we're now are in a command line interface shell for GRUB).
Entered root (hd2,0) (setting the boot drive where most of GRUB, and the system kernel files are).
Entered find (hd2,0)/grub/stage1 (the first file GRUB uses to boot up from). This step may not be necessary, but I wasn't about to erase a master bootblock, and try again, just to see if it was.
Received a list of drives partitions where it could be installed (as far as I could tell). According to their guide, this is a list of where it finds the “stage1” file, which was clearly nonsense (as it was not on the drives listed).
Entered setup (hd0) (to install the bootloader to the drive my system BIOS boots up).
Entered quit (to properly exit from the GRUB command line interface shell).

This is a slight variation from what's outlined in the “installation” section of the GRUB info file (instead of running man grub, run info grub, into a terminal). I'd advise you first try what the manual tells you, before what anyone else says (including myself).

If your BIOS has anti-virus features, designed to stop modifications to the hard drive boot blocks, you may have to disable it, before making any changes to your boot blocks (some of them may prompt you before allowing changes, some may prevent any changes, without any prompting).