FOR
SOLARIS 10 6/06 ADMINS
Disclaimer: there is a veritable
boatload of information concerning the new zfs pool and filesystem as
well as how to build generic containers on http://docs.sun.com but the
following tutorial is exactly how I built out our web farm and web
object servers. I've left out the web object containers as it's the
same as the web farm container installation and it'd be pretty lengthy.
I will discuss, however, the caveats we ran into with the network card
interfaces. This is for the Tutorial Competition 2 (do I put it here?)
How to build a web farm using
containers
Hardware used and Operating
System:
Solaris 10 6/06 release, generic install, slice 6 set for zfs pool
Four (4) T2000 4-core UltraSparc, 8gb memory, 2 73gb SAS drives in a
RAID-1 mirror (webobject servers)
Four (4) T1000 6-core UltraSparc, 8gb memory, 1 80gb SATA drive (apache
servers)
Now, before we get started,
I'll run down how the install was done. What I will not walk through is
configuring ALOM (Advanced Lights Out Management) as that's an entirely
different tutorial.
# format, disk 0 selected shows
the following. I've removed the umount errors and added how much space
each slice has been assigned for brevity.
/dev/dsk/c1t0d0s0 is currently mounted on /. 10gb
/dev/dsk/c1t0d0s1 is currently used by swap. 4gb
/dev/dsk/c1t0d0s3 is currently mounted on /usr. 6gb
/dev/dsk/c1t0d0s4 is currently mounted on /var. 10gb
/dev/dsk/c1t0d0s5 is currently mounted on /opt. 10gb
/dev/dsk/c1t0d0s6 is part of active ZFS pool zones. 25gb
/dev/dsk/c1t0d0s7 is currently mounted on /export/home. 4gb
Each T1000 and T2000 has been
set up the same way with the T1000 having a little more because it's an
80gb SATA instead of a 73gb SAS.
So now we know that
/dev/c1t0d0s6 is going to be the zfs pool with 25gb. ZFS really wants
pools to be an entire disk, but in this case, we couldn't do that since
we already spent $60,000 total for these eight Solaris boxes.
Confirm no pools exist:
Code:
# zpool list
no pools available
Create the pool using slice 6
on the disk and we want it to have a root of /export/zones and a
mountpoint of /export/zones/<name>:
Code:
# mkdir /export/zones
# zpool create -R /export/zones
-m / zones /dev/dsk/c1t0d0s6
# zpool list
NAME
SIZE USED AVAIL
CAP HEALTH ALTROOT
zones
25G 2.76G 22.2G 11%
ONLINE -
Now we have a functional 25gb
zfs pool in /export/zones. Time to create the zfs file systems for the
two containers, quota set to 12.5gb apiece.
Code:
# zfs create zones/zone1
# zfs set quota=12GB zones/zone1
# zfs create zones/zone2
# zfs set quota=12.5GB
zones/zone2
# zfs list
NAME
USED AVAIL REFER MOUNTPOINT
zones
2.76G 21.8G 27.5K /export/zones
zones/zone1
1.38G 11.1G 1.38G /export/zones/zone1
zones/zone2
1.38G 11.1G 1.38G /export/zones/zone2
Confirm no zones exist and
create both zones. Zones will be labeled web1 and web2 respectively.
Code:
# zoneadm list
global
# zonecfg -z web1
web1: No such zone configured
Use 'create' to begin
configuring a new zone.
zonecfg:web1> create
zonecfg:web1> add net
zonecfg:web1:net> set
physical=bge0
zonecfg:web1:net> set
address=10.127.91.251
zonecfg:web1:net> end
zonecfg:web1> set
zonepath=/export/zones/zone1
zonecfg:web1> set
autoboot=true
zonecfg:web1> verify
zonecfg:web1> commit
zonecfg:web1> exit
# zonecfg -z web2
web2: No such zone configured
Use 'create' to begin
configuring a new zone.
zonecfg:web2> create
zonecfg:web2> add net
zonecfg:web2:net> set
physical=bge0
zonecfg:web2:net> set
address=10.127.91.252/24
zonecfg:web2:net> end
zonecfg:web2> set
zonepath=/export/zones/zone2
zonecfg:web2> set
autoboot=true
zonecfg:web2> verify
zonecfg:web2> commit
zonecfg:web2> exit
Verify the zones are ok and fix
whatever's needing fixing. Errors will be the same for both zones so
I'm only reporting one zone. Build the zone after no errors are
reported. Building can take a while, but since this is a barebones
container (no special loopback filesystems, etc) it won't take too long.
It should be noted that the
number of files copied over will vary from system to system as certain
base directories are inherited automatically. More on this later.
Code:
# zoneadm -z web1 verify
/export/zones/zone1 must not be
group readable.
/export/zones/zone1 must not be
group executable.
/export/zones/zone1 must not be
world readable.
/export/zones/zone1 must not be
world executable.
could not verify zonepath
/export/zones/zone1 because of the above errors.
zoneadm: zone web1 failed to
verify
# chown -R root /export/zones/*
# chmod -R 700 /export/zones/*
# zoneadm -z web1 verify
<returns to prompt, no
errors>
# zoneadm -z web1 install
Preparing to install zone
<web1>.
Creating list of files to copy
from the global zone.
Copying <70531> files to
the zone.
Determining zone package
initialization order.
Preparing to initialize
<1190> packages on the zone.
Initializing package <59>
of <1190>: percent complete: 4%
While it's building, I'll take
a few minutes to mention special things to know about containers.
The global container is the
actual server. If you don't specify an inherited pkg directory in a
container build, then when you update the global container, the
containers respective directories automatically get the update as well.
If I did the following in the initial build (cannot inherit-pkg-dir
after it's installed):
Code:
# zonecfg -z web1
zonecfg:web1>add
inherit-pkg-dir
zonecfg:web1:inherit-pkg-dir>set
dir=/opt/sfw
zonecfg:web1:inherit-pkg-dir>end
zonecfg:web1>
This would mean that I could
update anything in /opt/sfw in the global container and container web1
would NOT receive the updates. I need to confirm if it's every
subdirectory in a given inherit-pkg-dir directive or if it's just the
one listed. You can get medieval with this directive but we won't go
into all that as it's confusing as hell.
Another thing to note before we
check back on our building is that the network address you've specified
for your container A: can't be on the network already and B: if you
forget to assign a netmask, you'll have to add the netmask to
/etc/netmasks and you won't be able to change the netmask on the
virtual with ifconfig.
Checking back on our build,
it's done and here's the output. We're ready to boot it.
Code:
...
Initialized <1190>
packages on
zone.
l
Zone <web1> is
initialized.
Installation of these packages
generated warnings: <SMCliconv SMClibgcc SMCcoreu SUNWcsu SUNWtcatu
SUNWipplu>
The file
</export/zones/zone1/root/var/sadm/system/logs/install_log>
contains a log of the zone installation.
# zoneadm -z web1 ready
# zoneadm -z web1 boot
At this point, you have a
bootable container that's up and running. You can see it in the process
list. It should be noted that in the global container, you'll see
everything under the sun as far as processes are concerned. However, if
you're in a container, you'll only see your own processes and nothing
relating to the global or other containers.
Code:
# ps -ef | grep -v grep | grep
-i zone
root
18973 1 0 09:35:24
? 0:00
zoneadmd -z web1
Time to log in to the new
server:
Code:
# zlogin web1
[Connected to zone 'web1' pts/5]
Sun Microsystems
Inc. SunOS 5.10 Generic
January 2005
#
Now you can add users
accordingly and even register the IP in DNS as your container name. You
now have a virtualized server with a 12.5gb quota and due to the zfs
filesystem it's sitting on, you have the new Predictive Healing in full
effect so file corruption is all but a thing of the past. You can give
complete sudo access to a developer or web administrator in this
container and they can reboot it however often as they want as it will
never touch the global server.
Also, if you want to do any
administration on the container, here's some examples:
Code:
# zoneadm -z web1 halt (halts
the container, like running init 0)
# zoneadm -z web1 boot (boots
the container, normal mode)
# zoneadm -z web1 boot -s
(boots the container to single-user mode)
# zoneadm -z web1 reboot
(reboots the container, like running init 6)
# zoneadm -z web1 install
(preps the container and builds)
# zoneadm -z web1 uninstall
(uninstalls the container, use -f after uninstall to force)
# zoneadm -z web1 ready
(readies the container)
# zoneadm list (list all
containers)
Congratulations, if you made it
this far, you have a fully-functional virtual server. Note that all
services that might be disabled on the global container will be enabled
on the container so you'll have to run svcadm a few times to turn off
things like ftp, telnet, finger, sendmail, and talk. (That's what I did
anyways for security)
Code:
# for i in sendmail telnet talk
ftp finger
do
svcadm disable $i &&
echo $i disabled
done
sendmail disabled
telnet disabled
talk disabled
ftp disabled
finger disabled
#
Hope this wasn't too long and I
have no idea how it'll be received but here it is! I'll list caveats
from my experience in another thread.
New stuff to remember when
putting containers or servers or whatever behind a load-balancer like
an F5 BIG IP.
If you have SSH enabled like I
do, you'll find that no matter what your timeout is, it will disconnect
after 5 minutes on the dot. Well, it took me piling through a bunch of
configs and trying to figure out what the F is going on...developers
are complaining because they can't leave a session idle for more than 5
minutes, etc...and it IS annoying.
By default, sshd will have
KeepAlive set to yes which allows KeepAlive packets to be sent from the
client, allowing the session to not really be "idle". This keeps
everyone happy because their ssh session can be there until the
universe melts.
SSH Client setup in
/etc/ssh/ssh_config:
ServerAliveInterval 3600
ServerAliveCountMax 1
SSH Daemon setup in
/etc/ssh/sshd_config:
ClientAliveIntervalMax 3600
ClientAliveCountMax 1
Now, by restarting the daemon
with 'svcadm refresh network/ssh', the config is reread and by
reconnecting with ssh, I should be able to stay on for an hour idle
without disconnect, right? Wrong. Disconnect at the 5 minute mark
exactly. So what's the issue here, I wonder...
So, in debug mode, I 'ssh -v -v
-v -l tkeller server1' where server1 is inside the F5's umbrella. I
also 'ssh -v -v -v -l tkeller server2' where server2 is NOT in the F5's
umbrella.
Now, when you use the most
verbosity like that on ssh, forgetting everything else, your session is
given two variables. SSH_CLIENT and SSH_CONNECTION.
Code:
$ server1 > echo $SSH_CLIENT
10.127.110.1
$ server2 > echo $SSH_CLIENT
10.127.91.250
Anyone else see the glaring
difference? SSH_CLIENT should ALWAYS be the IP of the machine you're
connecting in on, which in my case, is 10.127.91.50, NOT THE SWITCH OF
THE F5. No wonder it was disconnecting, it was sending the KeepAlive
statements to the switch, who doesn't care one way or another! *scream*
Moral of the story: turn debug
on and save yourself a ton of headache and useless research.
Original Tutorial
by Vorlin for TheTAZZone-TAZForum
Originally posted on August 30th, 2006 here
Do not use, republish, in whole or in part, without the consent of
the Author. TheTAZZone policy is that Authors retain the rights to the
work they submit and/or post...we do not sell, publish, transmit, or
have the right to give permission for such...TheTAZZone merely retains
the right to use, retain, and publish submitted work within it's
Network.

