[Linuxha-users] Linuxha v2 [aka truecl] is getting closer!
Simon Edwards
simon.edwards at linuxha.net
Tue Jul 24 21:01:43 BST 2007
Hello all,
The follow up to Linuxha 1.x is now really taking shape. Due to my
day-to-day workload it has taken far longer than expected to get this
far - but now the software is living up to the intended design. I've
captured some logs from cluster forming, application starting, status
reporting, application stopping and cluster halting to give an idea of
how things are currently.
The "lha_form" routine is used to start the cluster - the example
cluster is a 4 node cluster running Slackware, though the distribution
does not matter. Notice the formation time is 5 seconds. A 8 node
cluster forms in less than 10 seconds.
root at slack10s1:/opt/truecl/log# lha_form --verbose
Date: 2007/07/24
143110 [ 4260] LOG Verbose logging mode selected.
143110 [ 4260] LOG Checking for available Request Daemons ...
143110 [ 4260] LOG Starting Support Daemons on
143110 [ 4260] LOG slack10s1,slack10s2,slack10s3,slack10s4 ...
143113 [ 4260] LOG slack10s1 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG slack10s2 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG slack10s3 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG slack10s4 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG Starting Cluster Daemons on
143113 [ 4260] LOG slack10s1,slack10s2,slack10s3,slack10s4 ...
143113 [ 4260] LOG slack10s1 OK STARTED
143113 [ 4260] LOG slack10s2 OK STARTED
143113 [ 4260] LOG slack10s3 OK STARTED
143113 [ 4260] LOG slack10s4 OK STARTED
143115 [ 4260] LOG slack10s1 acting as current cluster master.
The "lha_startapp" routine starts an application. By default it runs on
the current node if that node is suitable for the application.
root at slack10s1:/opt/truecl/log# lha_startapp -A test1 -V
Date: 2007/07/24
143443 [ 4279] WARN No configured or specified timeout for 'test1' -
default
143443 [ 4279] WARN to 60.
143443 [ 4279] LOG Validated node 'slack10s1' is suitable for hosting
143443 [ 4279] LOG application 'test1'.
143443 [ 4279] LOG Attempting connection to Cluster Daemon on
'slack10s1'
143443 [ 4279] LOG ...
143443 [ 4279] LOG Connection to Cluster Daemon on 'slack10s1'
successful.
143443 [ 4279] LOG Attempting connection to Master Cluster Daemon on
143443 [ 4279] LOG 'slack10s1' ...
143443 [ 4279] LOG Connection to Master Cluster Daemon on 'slack10s1'
143443 [ 4279] LOG successful.
143443 [ 4279] LOG Checking for available Request Daemons - please
wait.
143443 [ 4279] LOG Required Request Daemons [slack10s1] running.
143443 [ 4279] LOG Attempting connection to Lock Daemon on
'slack10s1' ...
143443 [ 4279] LOG Connection to Lock Daemon on 'slack10s1'
successful.
143443 [ 4279] LOG Attempting connection to Stat Daemon on
'slack10s1' ...
143443 [ 4279] LOG Connection to Stat Daemon on 'slack10s1'
successful.
143443 [ 4279] LOG Stat Daemon on 'slack10s1' confirmed 'test1' is
not
143443 [ 4279] LOG running.
143443 [ 4279] LOG Attempting storage activation on other nodes -
please
143443 [ 4279] LOG wait...
143443 [ 4279] LOG other nodes:
143443 [ 4279] LOG slack10s2,slack10s1
143444 [ 4279] LOG Available relevant nodes have performed
non-current
143444 [ 4279] LOG storage activation.
143444 [ 4279] LOG Attempting storage activation on node 'slack10s1'
-
143444 [ 4279] LOG please wait...
143446 [ 4279] LOG Attempting final storage configuration on
secondary nodes
143446 [ 4279] LOG - please wait...
143446 [ 4279] LOG Attempting to mount file systems on 'slack10s1' -
please
143446 [ 4279] LOG wait...
143446 [ 4279] LOG File Systems mounted: OK=1, FAILED=0.
143446 [ 4279] LOG Application 'test1' IP configured successfully:
143446 [ 4279] LOG Configuring 192.168.1.243: /sbin/ifconfig eth0:1
inet
143446 [ 4279] LOG 192.168.1.243
143446 [ 4279] LOG Sending Builtin Gratuitous arp for eth0:1
143446 [ 4279] LOG Application Started successfully [RC=0].
Once the application is running the "lha_stat" gives an overview of the
cluster status:
root at slack10s1:/opt/truecl/log# lha_stat
cluster: slackcl - UP
nodes: 4 [0 DOWN/4 UP]
Node Status Apps
slack10s1 UP 1
slack10s2 UP 0
slack10s3 UP 0
slack10s4 UP 0
Appname Status Node F/O Notes
test1 UP slack10s1 2
The "lha_stat" can be passed a "-A appname" to give more details on a
particular application:
root at slack10s1:/opt/truecl/log# lha_stat -A test1
Application Status Node Storage Validated Valid Nodes
test1 RUNNING slack10s1 DRBD1 Y slack10s1,slack10s2
VG/LV Type Mount Point Size Status
testvg/test1 ext3 /test1 131072 Active,Syncing[12Kb/Sec]
Applications are stopped using the "lha_stopapp" - again works more
quickly than linuxha 1.x:
root at slack10s1:/opt/truecl/log# lha_stopapp -A test1 -V
Date: 2007/07/24
143634 [ 4321] WARN No configured or specified timeout for 'test1' -
default
143634 [ 4321] WARN to 60.
143634 [ 4321] LOG Ascertaining current node for 'test1' ...
143634 [ 4321] LOG Application 'test1' is running on 'slack10s1'.
143634 [ 4321] LOG Attempting connection to Cluster Daemon on
'slack10s1'
143634 [ 4321] LOG ...
143634 [ 4321] LOG Connection to Cluster Daemon on 'slack10s1'
successful.
143634 [ 4321] LOG Attempting connection to Master Cluster Daemon on
143634 [ 4321] LOG 'slack10s1' ...
143634 [ 4321] LOG Connection to Master Cluster Daemon on 'slack10s1'
143634 [ 4321] LOG successful.
143634 [ 4321] LOG Checking for available Request Daemons - please
wait.
143634 [ 4321] LOG Following Request Daemons
running:slack10s1,slack10s2
143634 [ 4321] LOG Attempting connection to Lock Daemon on
'slack10s1' ...
143634 [ 4321] LOG Connection to Lock Daemon on 'slack10s1'
successful.
143634 [ 4321] LOG Attempting connection to Stat Daemon on
'slack10s1' ...
143634 [ 4321] LOG Connection to Stat Daemon on 'slack10s1'
successful.
143634 [ 4321] LOG Stat Daemon on 'slack10s1' confirmed 'test1' is
running.
143635 [ 4321] LOG Stopping application 'test1' on 'slack10s1'...
143635 [ 4321] LOG Application Stopped successfully.
143635 [ 4321] LOG Deconfiguration IP configuration for 'test1' on
143635 [ 4321] LOG 'slack10s1'...
143635 [ 4321] LOG IP Addresses deconfigured successfully.
143635 [ 4321] LOG Attempting to un-mount file systems on 'slack10s1'
-
143635 [ 4321] LOG please wait...
143635 [ 4321] LOG File Systems un-mounted: OK=1, FAILED=0.
143635 [ 4321] LOG Attempting storage deactivation on 'slack10s1' ...
143635 [ 4321] LOG Re-attempting remote storage deactivation ...
143635 [ 4321] WARN slack10s1 : ERROR - Status information for device
0 not
143635 [ 4321] WARN found in '/proc/drbd'. from
143635 [ 4321] WARN run_before_shutdown_on_non_current
143635 [ 4321] LOG Available relevant nodes have performed storage
143635 [ 4321] LOG deactivation.
Finally dissolving the cluster requires running the "lha_dissolve"
command:
root at slack10s1:/opt/truecl/log# lha_dissolve --verbose
Date: 2007/07/24
143705 [ 4328] LOG Checking for available Request Daemons ...
143705 [ 4328] LOG Querying which nodes are running Cluster
Daemons ...
143706 [ 4328] LOG Current cluster master is 'slack10s1'.
143706 [ 4328] LOG Number of applications currently running: 0.
143706 [ 4328] LOG Stopping all 'clusterd' processes ...
143706 [ 4328] LOG Stopping all 'hbd' processes ...
143706 [ 4328] LOG Stopping all 'syncd' processes ...
143707 [ 4328] LOG Stopping all 'netd' processes ...
143707 [ 4328] LOG Stopping all 'lockd' processes ...
143707 [ 4328] LOG Stopping all 'statd' processes ...
143707 [ 4328] LOG Cluster has been halted.
A "alpha" version is now not far off being released. The network daemon
needs to be written and more tests and functionality added and a huge
amount of testing needs to be done - but overall things are looking
good!
Regards,
Simon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linuxha.net/pipermail/linuxha-users_linuxha.net/attachments/20070724/35fc905f/attachment.html
More information about the Linuxha-users
mailing list