I was playing with the Cloudera training virtual machine (as supplied by the Intro to Hadoop and MapReduce course over on Coursera) today and I installed it into VirtualBox as suggested by the instructor notes. I must have hit every pot-hole on the way through though so I thought I’d document them here for anyone else following this course.
Firstly you have to use WinRAR to unzip the virtual machine disk supplied by the course. I’m not sure what is wrong with the zip file they supply but it causes a CRC check when using 7Zip and the Windows built in zip utility just fails. The offending file is the VMDK file which is the virtual disk and this is quite sizeable at 4GB so my guess is the utility that zipped the file could handle it correctly. I’ve seen reports of just about every zip utility choke and die on the file except WinRAR and maybe the Linux zip libraries. You don’t have to buy WinRAR for this one operation as it has a free trial period.
Once you have the disk downloaded and unzipped set up a new virtual machine and fire it up. If you are using VirtualBox you’ll probably want to install the guest additions to (dramatically) improve performance and give you a must better display. This is easier said than done though. Normally it’s just a case of getting VirtualBox to insert the additions CD and supplying the root password but as there is no root password you’ll need to install it manually. to complicate matters the system doesn’t have the latest kernel and the headers are needed to compile the additions modules.
First install the latest kernel with:
sudo yum install kernel
This will install the kernel “kernel.i686 0:2.6.32-431.5.1.el6” whereas the kernel current on the machine is “kernel.i686 0:2.6.32-279.14.1.el6”. That may seem like a small change but it means the headers you download later will match the kernel. The problem is that as far as I can see the headers aren’t easily available for the older kernel hence installing the newer kernel.
Now that you’ve installed a new kernel you need to update grub to tell it us use the new kernel. I’ve been spoilt by Ubuntu based machines for too long because there it’s just a case of running “sudo update-grub”. This training machine is CentOS based and doesn’t even seem to have the grub-mkconfig command available which means you’ll need to hack the grub.conf file by hand. Fortunately the machine does have Emacs installed so not all is lost.
sudo su cd /boot/grub emacs grub.conf
Make the grub.conf file look like this:
default=1
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Cloudera-Training-VM-4.1.1.c (2.6.32-279.14.1.el6.i686)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-279.14.1.el6.i686 ro root=LABEL=79d3d2d4
initrd /boot/initramfs-2.6.32-279.14.1.el6.i686.img
title Cloudera-Training-VM-4.1.1.c (2.6.32-431.5.1.el6.i686)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-431.5.1.el6.i686 ro root=LABEL=79d3d2d4
initrd /boot/initramfs-2.6.32-431.5.1.el6.i686.img
the first line “default=1” tells grub to load the second kernel listed (our new 431 kernel). Once you’ve updated and saved the file perform a restart then enter:
sudo yum install kernel-devel.i686
This will install the kernel headers necessary to build the modules which are part of the VirtualBox additions. Once the headers are installed use the menu at the top of the Virtualbox window to insert the additions CD (under Devices). Don’t bother auto-running the CD as it won’t run. Go back to the command prompt and go to “/media/VBOXADDITIONS_???/”. The question marks denote the current version. In that directory run
sudo sh VBoxLinuxAdditions.run
The system will then compile and install the additions automatically. Finally re-start the machine again to load the new modules. The biggest benefit to me of the additions is when going full screen the guest automatically re-sizes to fit the display.