After I wrote my earlier post about setting up local NFS and enabling sparse provisioning, I was having some performance issues with NFS. It turns out that you can still have sparse provisioning with local storage when it's set up as ext, instead of lvm.

XenServer will create sparse disks for your VMs only on NFS based volumes or locally formatted ext volumes. If you are using ISCSI or lvm based local storage, it will fully allocate each disk. This is less of a problem when you have a thinly provisioned ISCSI volume which is also compressed (such as those provided by Nexenta/OpenSolaris), but when you have a limited amount of local storage, it's unfortunate that XenServer doesn't give you the option to create sparse disks out of the box.

Luckily, we can convert the storage to ext3 and enable thin provisioning.

First we will need to delete the current Local Storage, which involves quite a few steps.

view plain print about
1#lvdisplay
2 --- Logical volume ---
3 LV Name /dev/VG_XenStorage-877a1f66-59dd-b1ba-0de3-d8a753d0a0b2/MGT
4 VG Name VG_XenStorage-877a1f66-59dd-b1ba-0de3-d8a753d0a0b2
5 LV UUID HlifV3-7B6J-rVCp-pAxP-Rc7e-viEQ-Tt819a
6 LV Write Access read/write
7 LV Status NOT available
8 LV Size 4.00 MB
9 Current LE 1
10 Segments 1
11 Allocation inherit
12 Read ahead sectors auto
Now we run xe pbd-list and find the one that matches our local SR.
view plain print about
1#xe pbd-list
2uuid ( RO) : e06a37a5-e1e6-4415-134e-f5ae4f7f7c17
3 host-uuid ( RO): 99ae2c2e-b992-4b4d-a09d-15cba5483225
4 sr-uuid ( RO): fc8c8fc5-4a06-18b9-517d-f19a18e50820
5 device-config (MRO): location: /dev/xapi/cd
6 currently-attached ( RO): true
7
8
9uuid ( RO) : 7a75d1b9-9aa5-d5f7-6b85-50d1175054c8
10 host-uuid ( RO): 99ae2c2e-b992-4b4d-a09d-15cba5483225
11 sr-uuid ( RO): a6db482a-5446-3fc3-6513-6933eb9d7915
12 device-config (MRO): location: /dev/xapi/block
13 currently-attached ( RO): true
14
15
16uuid ( RO) : 6c1eeeb0-3c05-fbee-4dca-e7490e007504
17 host-uuid ( RO): 99ae2c2e-b992-4b4d-a09d-15cba5483225
18 sr-uuid ( RO): f663b0ec-56b0-72dc-d546-4e0fc9be7ef6
19 device-config (MRO): location: /opt/xensource/packages/iso; legacy_mode: true
20 currently-attached ( RO): true
21uuid ( RO) : 825bc0eb-8ec9-01ab-e249-21146d66dd9a
22 host-uuid ( RO): 99ae2c2e-b992-4b4d-a09d-15cba5483225
23 sr-uuid ( RO): 877a1f66-59dd-b1ba-0de3-d8a753d0a0b2
24 device-config (MRO): device: /dev/disk/by-id/scsi-SATA_WDC_WD800JD-75M_WD-WMAM9AJ38489-part3
25 currently-attached ( RO): true

As we can see, it's the last one. Now we unplug it and destroy it.

view plain print about
1#xe pbd-unplug uuid=825bc0eb-8ec9-01ab-e249-21146d66dd9a
2#xe pbd-destroy uuid=825bc0eb-8ec9-01ab-e249-21146d66dd9a

Now lets destroy the storage repository attached to it. First we get a list of the repositories.

view plain print about
1#xe sr-list
2uuid ( RO) : a6db482a-5446-3fc3-6513-6933eb9d7915
3 name-label ( RW): Removable storage
4 name-description ( RW):
5 host ( RO): xenserver-ueyqfddq
6 type ( RO): udev
7 content-type ( RO): disk
8
9
10uuid ( RO) : fc8c8fc5-4a06-18b9-517d-f19a18e50820
11 name-label ( RW): DVD drives
12 name-description ( RW): Physical DVD drives
13 host ( RO): xenserver-ueyqfddq
14 type ( RO): udev
15 content-type ( RO): iso
16
17
18uuid ( RO) : 877a1f66-59dd-b1ba-0de3-d8a753d0a0b2
19 name-label ( RW): Local storage
20 name-description ( RW):
21 host ( RO): <not in database>
22 type ( RO): lvm
23 content-type ( RO): user
24
25
26uuid ( RO) : f663b0ec-56b0-72dc-d546-4e0fc9be7ef6
27 name-label ( RW): XenServer Tools
28 name-description ( RW): XenServer Tools ISOs
29 host ( RO): xenserver-ueyqfddq
30 type ( RO): iso
31 content-type ( RO): iso
We will delete the one called local storage.

view plain print about
1#xe sr-forget uuid=877a1f66-59dd-b1ba-0de3-d8a753d0a0b2

At this point it will be forgotten and disappear from XenCenter.

Now, lets delete it from LVM.

view plain print about
1#vgdisplay
2 --- Volume group ---
3 VG Name VG_XenStorage-877a1f66-59dd-b1ba-0de3-d8a753d0a0b2
4 System ID
5 Format lvm2
6 Metadata Areas 1
7 Metadata Sequence No 3
8 VG Access read/write
9 VG Status resizable
10 MAX LV 0
11 Cur LV 1
12 Open LV 0
13 Max PV 0
14 Cur PV 1
15 Act PV 1
16 VG Size 66.85 GB
17 PE Size 4.00 MB
18 Total PE 17113
19 Alloc PE / Size 1 / 4.00 MB
20 Free PE / Size 17112 / 66.84 GB
21 VG UUID 2PJJkR-ULpa-1F6f-8H65-N22A-o11C-f0KkGa

Now lets remove it.

view plain print about
1#vgremove VG_XenStorage-877a1f66-59dd-b1ba-0de3-d8a753d0a0b2
2Do you really want to remove volume group "VG_XenStorage-877a1f66-59dd-b1ba-0de3-d8a753d0a0b2" containing 1 logical volumes? [y/n]: y
3 Logical volume "MGT" successfully removed
4 Volume group "VG_XenStorage-877a1f66-59dd-b1ba-0de3-d8a753d0a0b2" successfully removed

view plain print about
1#pvdisplay
2 "/dev/sda3" is a new physical volume of "66.86 GB"
3 --- NEW Physical volume ---
4 PV Name /dev/sda3
5 VG Name
6 PV Size 66.86 GB
7 Allocatable NO
8 PE Size (KByte) 0
9 Total PE 0
10 Free PE 0
11 Allocated PE 0
12 PV UUID ZK23c0-UvDg-A4MB-TWKi-YfNW-hWUK-uoyqdo
Now we remove the physical volume from LVM.
view plain print about
1#pvremove /dev/sda3
2 Labels on physical volume "/dev/sda3" successfully wiped

The instructions that follow use a different system, and a different hard disk.

I am using a 2TB disk and will create 2 888GB partitions for local storage and one 200GB for ISOs.

Since XenServer will only allow a single SR on a drive, we'll need to create a fake raid array on the second data partition. To this end we need to change the type of the partition to "Linux raid autodetect", so that it's automounted on boot.

view plain print about
1# fdisk /dev/sda
2
3The number of cylinders for this disk is set to 243133.
4There is nothing wrong with that, but this is larger than 1024,
5and could in certain setups cause problems with:
61) software that runs at boot time (e.g., old versions of LILO)
72) booting and partitioning software from other OSs
8 (e.g., DOS FDISK, OS/2 FDISK)
9
10Command (m for help): m
11Command action
12 a toggle a bootable flag
13 b edit bsd disklabel
14 c toggle the dos compatibility flag
15 d delete a partition
16 l list known partition types
17 m print this menu
18 n add a new partition
19 o create a new empty DOS partition table
20 p print the partition table
21 q quit without saving changes
22 s create a new empty Sun disklabel
23 t change a partition's system id
24 u change display/entry units
25 v verify the partition table
26 w write table to disk and exit
27 x extra functionality (experts only)
28
29Command (m for help): p
30
31Disk /dev/sda: 1999.8 GB, 1999844147200 bytes
32255 heads, 63 sectors/track, 243133 cylinders
33Units = cylinders of 16065 * 512 = 8225280 bytes
34
35 Device Boot Start End Blocks Id System
36/dev/sda1 * 1 499 4008186 83 Linux
37/dev/sda2 500 998 4008217+ 83 Linux
38/dev/sda3 999 243133 1944949387+ 83 Linux
39
40Command (m for help): d
41Partition number (1-4): 3
42
43Command (m for help): p
44
45Disk /dev/sda: 1999.8 GB, 1999844147200 bytes
46255 heads, 63 sectors/track, 243133 cylinders
47Units = cylinders of 16065 * 512 = 8225280 bytes
48
49 Device Boot Start End Blocks Id System
50/dev/sda1 * 1 499 4008186 83 Linux
51/dev/sda2 500 998 4008217+ 83 Linux
52
53Command (m for help): n
54Command action
55 e extended
56 p primary partition (1-4)
57e
58Partition number (1-4): 3
59First cylinder (999-243133, default 999):
60Using default value 999
61Last cylinder or +size or +sizeM or +sizeK (999-243133, default 243133):
62Using default value 243133
63
64Command (m for help): p
65
66Disk /dev/sda: 1999.8 GB, 1999844147200 bytes
67255 heads, 63 sectors/track, 243133 cylinders
68Units = cylinders of 16065 * 512 = 8225280 bytes
69
70 Device Boot Start End Blocks Id System
71/dev/sda1 * 1 499 4008186 83 Linux
72/dev/sda2 500 998 4008217+ 83 Linux
73/dev/sda3 999 243133 1944949387+ 5 Extended
74
75Command (m for help): n
76Command action
77 l logical (5 or over)
78 p primary partition (1-4)
79l
80First cylinder (999-243133, default 999):
81Using default value 999
82Last cylinder or +size or +sizeM or +sizeK (999-243133, default 243133): +888G
83
84Command (m for help): n
85Command action
86 l logical (5 or over)
87 p primary partition (1-4)
88l
89First cylinder (108960-243133, default 108960):
90Using default value 108960
91Last cylinder or +size or +sizeM or +sizeK (108960-243133, default 243133): 888G
92Value out of range.
93Last cylinder or +size or +sizeM or +sizeK (108960-243133, default 243133): +888G
94
95Command (m for help): n
96Command action
97 l logical (5 or over)
98 p primary partition (1-4)
99l
100First cylinder (216921-243133, default 216921):
101Using default value 216921
102Last cylinder or +size or +sizeM or +sizeK (216921-243133, default 243133):
103Using default value 243133
104
105Command (m for help): p
106
107Disk /dev/sda: 1999.8 GB, 1999844147200 bytes
108255 heads, 63 sectors/track, 243133 cylinders
109Units = cylinders of 16065 * 512 = 8225280 bytes
110
111 Device Boot Start End Blocks Id System
112/dev/sda1 * 1 499 4008186 83 Linux
113/dev/sda2 500 998 4008217+ 83 Linux
114/dev/sda3 999 243133 1944949387+ 5 Extended
115/dev/sda5 999 108959 867196701 83 Linux
116/dev/sda6 108960 216920 867196701 83 Linux
117/dev/sda7 216921 243133 210555891 83 Linux
118
119Command (m for help): t
120Partition number (1-7): 6
121Hex code (type L to list codes): fd
122Changed system type of partition 6 to fd (Linux raid autodetect)
123
124Command (m for help): p
125
126Disk /dev/sda: 1999.8 GB, 1999844147200 bytes
127255 heads, 63 sectors/track, 243133 cylinders
128Units = cylinders of 16065 * 512 = 8225280 bytes
129
130 Device Boot Start End Blocks Id System
131/dev/sda1 * 1 499 4008186 83 Linux
132/dev/sda2 500 998 4008217+ 83 Linux
133/dev/sda3 999 243133 1944949387+ 5 Extended
134/dev/sda5 999 108959 867196701 83 Linux
135/dev/sda6 108960 216920 867196701 fd Linux raid autodetect
136/dev/sda7 216921 243133 210555891 83 Linux
137
138Command (m for help): w
139The partition table has been altered!
140
141Calling ioctl() to re-read partition table.
142
143WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
144The kernel still uses the old table.
145The new table will be used at the next reboot.
146Syncing disks.
At this point I had to reboot the server because of the warning above. You may or may not get this warning and need to reboot before proceeding.

First lets set up our ISO repository.

view plain print about
1# mkfs.ext3 -m 0 /dev/sda7
2mke2fs 1.39 (29-May-2006)
3Filesystem label=
4OS type: Linux
5Block size=4096 (log=2)
6Fragment size=4096 (log=2)
726329088 inodes, 52638972 blocks
80 blocks (0.00%) reserved for the super user
9First data block=0
10Maximum filesystem blocks=0
111607 block groups
1232768 blocks per group, 32768 fragments per group
1316384 inodes per group
14Superblock backups stored on blocks:
15 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
16 4096000, 7962624, 11239424, 20480000, 23887872
17
18Writing inode tables: done
19Creating journal (32768 blocks): done
20Writing superblocks and filesystem accounting information: done
21
22This filesystem will be automatically checked every 36 mounts or
23180 days, whichever comes first. Use tune2fs -c or -i to override.

Now lets set up a folder for it and make sure it gets mounted at boot time.

view plain print about
1#mkdir /data

view plain print about
1#vi /etc/fstab
2LABEL=root-cjfffcbq /ext3 defaults 1 1
3/var/swap/swap.001swap swap defaults 0 0
4none/dev/pts devpts defaults 0 0
5none/dev/shm tmpfs defaults 0 0
6none/proc proc defaults 0 0
7none/sys sysfs defaults 0 0

We need to add our directory to automatically mount at boot time. Lets add this entry

view plain print about
1/dev/sda7 /data ext3 defaults 1 2

This is how my file looked like at the end.

view plain print about
1/ext3 defaults 1 1
2/var/swap/swap.001swap swap defaults 0 0
3none/dev/pts devpts defaults 0 0
4none/dev/shm tmpfs defaults 0 0
5none/proc proc defaults 0 0
6none/sys sysfs defaults 0 0
7/dev/sda7 /data ext3 defaults 1 2

Now lets try mounting it.

view plain print about
1#mount /data

If you get no output, that means it probably worked.

Lets check that it mounted successfully.

view plain print about
1#ls /data
2lost+found

Now lets create our directory that we will export.

view plain print about
1#mkdir /data/iso

Lets edit the exports file.

view plain print about
1#vi /etc/exports
The file should initially be empty. Lets add our entries.
view plain print about
1/data/iso 127.0.0.1(ro,no_root_squash,sync)

Save the file, and now lets start the nfs and portmap services.

view plain print about
1#service nfs start
2Starting NFS services: [ OK ]
3Starting NFS daemon: [ OK ]
4Starting NFS mountd: [ OK ]
5#service portmap start
6Starting portmap: [ OK ]

Lets make sure that the portmap and nfs services start at boot.

view plain print about
1#chkconfig --level 345 nfs on
2#chkconfig --level 345 portmap on

Lets verify that our directory is being exported.

view plain print about
1#exportfs
2/data/iso 127.0.0.1

Lets add an ISO SR as well. Right click on the server in XenCenter, and click "New Storage Repository". Select "NFS ISO" as the type, and enter the following parameters.

view plain print about
1Name: NFS Local ISO library
2Share Name: 127.0.0.1:/data/iso
3Click "Finish".

Now we will create the actual Storage Repositories to keep our VMs. I have decided to split mine in two as a way to short stroke the disk - I will try to use mostly the first SR, and maybe put in VMs that are not used often into the second SR, as the first SR should have slightly faster speed and better average seek time. You may choose to only have a single SR, so you'll need to modify your fdisk config earlier.

First we create our ext3 filesystems.

view plain print about
1# mkfs.ext3 -m 0 /dev/sda5
2mke2fs 1.39 (29-May-2006)
3Filesystem label=
4OS type: Linux
5Block size=4096 (log=2)
6Fragment size=4096 (log=2)
7108412928 inodes, 216799175 blocks
80 blocks (0.00%) reserved for the super user
9First data block=0
10Maximum filesystem blocks=0
116617 block groups
1232768 blocks per group, 32768 fragments per group
1316384 inodes per group
14Superblock backups stored on blocks:
15 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
16 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
17 102400000, 214990848
18
19Writing inode tables: done
20Creating journal (32768 blocks): done
21Writing superblocks and filesystem accounting information: done
22
23This filesystem will be automatically checked every 22 mounts or
24180 days, whichever comes first. Use tune2fs -c or -i to override.
view plain print about
1# mkfs.ext3 -m 0 /dev/sda6
2mke2fs 1.39 (29-May-2006)
3Filesystem label=
4OS type: Linux
5Block size=4096 (log=2)
6Fragment size=4096 (log=2)
7108412928 inodes, 216799175 blocks
80 blocks (0.00%) reserved for the super user
9First data block=0
10Maximum filesystem blocks=0
116617 block groups
1232768 blocks per group, 32768 fragments per group
1316384 inodes per group
14Superblock backups stored on blocks:
15 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
16 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
17 102400000, 214990848
18
19Writing inode tables: done
20Creating journal (32768 blocks): done
21Writing superblocks and filesystem accounting information: done
22
23This filesystem will be automatically checked every 30 mounts or
24180 days, whichever comes first. Use tune2fs -c or -i to override.

Update (6/8/2010): Turns out it's not necessary to make the ext3 partitions yourself, as XenServer does it for you.
Now we will create the storage repositories on these ext partitions. As I mentioned earlier, we need to create a fake raid on top of the second partition, otherwise XenServer won't us use it.

Update (6/10/2010): Looks like XenServer 5.6 has some sort of bug which prevents it from booting with any partitions set to type "fd". If you are using 5.6 - DO NOT CREATE an md device. Instead just partition it so that there is only a single data device.

view plain print about
1# mdadm --create /dev/md0 --level=1 --raid-devices=1 /dev/sda6 --force
2mdadm: array /dev/md0 started.

Now lets set up our Storage Repositories. This will take several minutes (about 15 minutes for me for a 1TB repo), so don't panic if you don't see any progress for a while.

view plain print about
1# xe sr-create host-uuid=999847a4-d895-4e30-a703-1a20f9930cfd content-type=user name-label="Local SR1" shared=false device-config:device=/dev/sda5 type=ext
20414bfc6-bad9-81ed-ead9-9d3c42f19b61

Now we set up a Storage Repository on top of the fake array.

view plain print about
1#xe sr-create host-uuid=999847a4-d895-4e30-a703-1a20f9930cfd content-type=user name-label="Local SR2" shared=false device-config:device=/dev/md0 type=ext
2037fb965-61c5-1345-4d73-b7f9736133ad

Now we reboot and if everything comes back up, we are done. Now when we create VMs, the Virtual Hard Disks will be allocated sparsely, and VMs created from templates, will all use the same base VHD and only store the changes in their own disk.