[Linuxha-users] Linuxha v2 [aka truecl] is getting closer!

Simon Edwards simon.edwards at linuxha.net
Tue Jul 24 21:01:43 BST 2007


Hello all,

The follow up to Linuxha 1.x is now really taking shape. Due to my
day-to-day workload it has taken far longer than expected to get this
far - but now the software is living up to the intended design. I've
captured some logs from cluster forming, application starting, status
reporting, application stopping and cluster halting to give an idea of
how things are currently.

The "lha_form" routine is used to start the cluster - the example
cluster is a 4 node cluster running Slackware, though the distribution
does not matter. Notice the formation time is 5 seconds. A 8 node
cluster forms in less than 10 seconds. 

root at slack10s1:/opt/truecl/log# lha_form --verbose
Date: 2007/07/24
143110 [ 4260] LOG    Verbose logging mode selected.
143110 [ 4260] LOG    Checking for available Request Daemons ...
143110 [ 4260] LOG    Starting Support Daemons on
143110 [ 4260] LOG    slack10s1,slack10s2,slack10s3,slack10s4 ...
143113 [ 4260] LOG    slack10s1 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG    slack10s2 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG    slack10s3 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG    slack10s4 : hbd YES,lockd YES,netd NO,syncd
YES,statd YES
143113 [ 4260] LOG    Starting Cluster Daemons on
143113 [ 4260] LOG    slack10s1,slack10s2,slack10s3,slack10s4 ...
143113 [ 4260] LOG    slack10s1 OK STARTED
143113 [ 4260] LOG    slack10s2 OK STARTED
143113 [ 4260] LOG    slack10s3 OK STARTED
143113 [ 4260] LOG    slack10s4 OK STARTED
143115 [ 4260] LOG    slack10s1 acting as current cluster master.

The "lha_startapp" routine starts an application. By default it runs on
the current node if that node is suitable for the application.

root at slack10s1:/opt/truecl/log# lha_startapp -A test1 -V
Date: 2007/07/24
143443 [ 4279] WARN   No configured or specified timeout for 'test1' -
default
143443 [ 4279] WARN   to 60.
143443 [ 4279] LOG    Validated node 'slack10s1' is suitable for hosting
143443 [ 4279] LOG    application 'test1'.
143443 [ 4279] LOG    Attempting connection to Cluster Daemon on
'slack10s1'
143443 [ 4279] LOG    ...
143443 [ 4279] LOG    Connection to Cluster Daemon on 'slack10s1'
successful.
143443 [ 4279] LOG    Attempting connection to Master Cluster Daemon on
143443 [ 4279] LOG    'slack10s1' ...
143443 [ 4279] LOG    Connection to Master Cluster Daemon on 'slack10s1'
143443 [ 4279] LOG    successful.
143443 [ 4279] LOG    Checking for available Request Daemons - please
wait.
143443 [ 4279] LOG    Required Request Daemons [slack10s1] running.
143443 [ 4279] LOG    Attempting connection to Lock Daemon on
'slack10s1' ...
143443 [ 4279] LOG    Connection to Lock Daemon on 'slack10s1'
successful.
143443 [ 4279] LOG    Attempting connection to Stat Daemon on
'slack10s1' ...
143443 [ 4279] LOG    Connection to Stat Daemon on 'slack10s1'
successful.
143443 [ 4279] LOG    Stat Daemon on 'slack10s1' confirmed 'test1' is
not
143443 [ 4279] LOG    running.
143443 [ 4279] LOG    Attempting storage activation on other nodes -
please
143443 [ 4279] LOG    wait...
143443 [ 4279] LOG    other nodes:
143443 [ 4279] LOG    slack10s2,slack10s1
143444 [ 4279] LOG    Available relevant nodes have performed
non-current
143444 [ 4279] LOG    storage activation.
143444 [ 4279] LOG    Attempting storage activation on node 'slack10s1'
-
143444 [ 4279] LOG    please wait...
143446 [ 4279] LOG    Attempting final storage configuration on
secondary nodes
143446 [ 4279] LOG    - please wait...
143446 [ 4279] LOG    Attempting to mount file systems on 'slack10s1' -
please
143446 [ 4279] LOG    wait...
143446 [ 4279] LOG    File Systems mounted: OK=1, FAILED=0.
143446 [ 4279] LOG    Application 'test1' IP configured successfully:
143446 [ 4279] LOG    Configuring 192.168.1.243: /sbin/ifconfig eth0:1
inet
143446 [ 4279] LOG    192.168.1.243
143446 [ 4279] LOG    Sending Builtin Gratuitous arp for eth0:1
143446 [ 4279] LOG    Application Started successfully [RC=0].

Once the application is running the "lha_stat" gives an overview of the
cluster status:

root at slack10s1:/opt/truecl/log# lha_stat
cluster: slackcl - UP
nodes:   4 [0 DOWN/4 UP]

Node       Status  Apps
slack10s1  UP      1
slack10s2  UP      0
slack10s3  UP      0
slack10s4  UP      0

Appname   Status   Node       F/O   Notes
test1     UP       slack10s1  2     

The "lha_stat" can be passed a "-A appname" to give more details on a
particular application:

root at slack10s1:/opt/truecl/log# lha_stat -A test1
Application  Status   Node       Storage  Validated  Valid Nodes
test1        RUNNING  slack10s1  DRBD1    Y          slack10s1,slack10s2

VG/LV         Type   Mount Point  Size    Status
testvg/test1  ext3   /test1       131072  Active,Syncing[12Kb/Sec]

Applications are stopped using the "lha_stopapp" - again works more
quickly than linuxha 1.x:

root at slack10s1:/opt/truecl/log# lha_stopapp -A test1 -V
Date: 2007/07/24
143634 [ 4321] WARN   No configured or specified timeout for 'test1' -
default
143634 [ 4321] WARN   to 60.
143634 [ 4321] LOG    Ascertaining current node for 'test1' ...
143634 [ 4321] LOG    Application 'test1' is running on 'slack10s1'.
143634 [ 4321] LOG    Attempting connection to Cluster Daemon on
'slack10s1'
143634 [ 4321] LOG    ...
143634 [ 4321] LOG    Connection to Cluster Daemon on 'slack10s1'
successful.
143634 [ 4321] LOG    Attempting connection to Master Cluster Daemon on
143634 [ 4321] LOG    'slack10s1' ...
143634 [ 4321] LOG    Connection to Master Cluster Daemon on 'slack10s1'
143634 [ 4321] LOG    successful.
143634 [ 4321] LOG    Checking for available Request Daemons - please
wait.
143634 [ 4321] LOG    Following Request Daemons
running:slack10s1,slack10s2
143634 [ 4321] LOG    Attempting connection to Lock Daemon on
'slack10s1' ...
143634 [ 4321] LOG    Connection to Lock Daemon on 'slack10s1'
successful.
143634 [ 4321] LOG    Attempting connection to Stat Daemon on
'slack10s1' ...
143634 [ 4321] LOG    Connection to Stat Daemon on 'slack10s1'
successful.
143634 [ 4321] LOG    Stat Daemon on 'slack10s1' confirmed 'test1' is
running.
143635 [ 4321] LOG    Stopping application 'test1' on 'slack10s1'...
143635 [ 4321] LOG    Application Stopped successfully.
143635 [ 4321] LOG    Deconfiguration IP configuration for 'test1' on
143635 [ 4321] LOG    'slack10s1'...
143635 [ 4321] LOG    IP Addresses deconfigured successfully.
143635 [ 4321] LOG    Attempting to un-mount file systems on 'slack10s1'
-
143635 [ 4321] LOG    please wait...
143635 [ 4321] LOG    File Systems un-mounted: OK=1, FAILED=0.
143635 [ 4321] LOG    Attempting storage deactivation on 'slack10s1' ...
143635 [ 4321] LOG    Re-attempting remote storage deactivation ...
143635 [ 4321] WARN   slack10s1 : ERROR - Status information for device
0 not
143635 [ 4321] WARN   found in '/proc/drbd'. from
143635 [ 4321] WARN   run_before_shutdown_on_non_current
143635 [ 4321] LOG    Available relevant nodes have performed storage
143635 [ 4321] LOG    deactivation.

Finally dissolving the cluster requires running the "lha_dissolve"
command:

root at slack10s1:/opt/truecl/log# lha_dissolve --verbose
Date: 2007/07/24
143705 [ 4328] LOG    Checking for available Request Daemons ...
143705 [ 4328] LOG    Querying which nodes are running Cluster
Daemons ...
143706 [ 4328] LOG    Current cluster master is 'slack10s1'.
143706 [ 4328] LOG    Number of applications currently running: 0.
143706 [ 4328] LOG    Stopping all 'clusterd' processes ...
143706 [ 4328] LOG    Stopping all 'hbd' processes ...
143706 [ 4328] LOG    Stopping all 'syncd' processes ...
143707 [ 4328] LOG    Stopping all 'netd' processes ...
143707 [ 4328] LOG    Stopping all 'lockd' processes ...
143707 [ 4328] LOG    Stopping all 'statd' processes ...
143707 [ 4328] LOG    Cluster has been halted.

A "alpha" version is now not far off being released. The network daemon
needs to be written and more tests and functionality added and a huge
amount of testing needs to be done - but overall things are looking
good!

Regards,
Simon.








-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linuxha.net/pipermail/linuxha-users_linuxha.net/attachments/20070724/35fc905f/attachment.html 


More information about the Linuxha-users mailing list