[Linuxha-users] Caught in a loop

Michael Mansour mic at npgx.com.au
Mon Jan 31 04:50:46 GMT 2005


Hi Simon,

After thinking about this alot and running through some of my own setups for 
it, I couldn't get it to work. What I tried was to have a separate /etc/httpd/
conf.d.cluster directory together with the normal /etc/httpd/conf directory.

So I decided to test your suggestion below, but found this didn't work either, 
with the problem being that the report shown in /var/log/cluster/apache.start.
log is:

/apache/admin/scripts/apachectl start: httpd (pid 9605) already running

so it seemed no attempt at running another port 80 listener was made.

I'll explain this test setup for you. I set this up as:

1. /etc/httpd_local be the ServerRoot for the local httpd (local:80)

2. /etc/httpd be the ServerRoot for the cluster (clusternode1:80)

For 1, it starts up fine using the normal OS start scripts "service httpd 
start":

 9605 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9608 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9609 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9610 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9611 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9612 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9613 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
 9614 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local

and:

tcp        0      0 local:80              0.0.0.0:*               LISTEN

But when I then execute 2, the apache app starts up fine with "clstartapp":

[root at node1 conf]# clstartapp -A apache -V
INFO  31/01/2005 04:10:43 Validated checksum for cluster configuration
INFO  31/01/2005 04:10:43 Checked that node names resolve to IP addresses
INFO  31/01/2005 04:10:43 Validated Build run has completed against this 
configuration.
INFO  31/01/2005 04:10:43 drbd kernel module loaded already on xxx.xxx.xxx.xxx
INFO  31/01/2005 04:10:43 Checking heartbeats for any sign of life...
WARN  31/01/2005 04:10:43 Attemping ICMP ping of xxx.xxx.xxx.xxx...
INFO  31/01/2005 04:10:44 drbd kernel module loaded already on xxxxxxxxx
INFO  31/01/2005 04:10:44 Local DRBD devices started successfully.
INFO  31/01/2005 04:10:44 DRBD: Skipping ENBD decisioning and relying on meta 
data...
INFO  31/01/2005 04:10:44 Attempting to start DRBD services on xxxxxxxxx
INFO  31/01/2005 04:10:45 DRBD devices started successfully on xxxxxxxxx
INFO  31/01/2005 04:10:45 Validated consistency of available data for DRBD.
INFO  31/01/2005 04:10:45 Both data copies believed good.
WARN  31/01/2005 04:10:45 Locking services not available.
INFO  31/01/2005 04:10:45 Attempting to register application apache as 
starting...
INFO  31/01/2005 04:10:46 Application registered successfully as starting.
INFO  31/01/2005 04:10:46 Checking IP address for application is not in use...
INFO  31/01/2005 04:10:48 Required application IP address is not pingable - 
continuing.
INFO  31/01/2005 04:10:48 Attempting to make local DRBD devices primary...
INFO  31/01/2005 04:10:48 All local DRBD now primary.
INFO  31/01/2005 04:10:48 Running "/sbin/fsck -t ext3 -a /dev/drbd0"...
INFO  31/01/2005 04:10:48 Running "PATH=$PATH:/sbin:/bin:/usr/sbin; mount -t 
ext3 -o rw /dev/drbd0 /apache"...
INFO  31/01/2005 04:10:48 File systems mounted on DRBD devices.
INFO  31/01/2005 04:10:48 choose_interface: Link beat ok on interface eth0
INFO  31/01/2005 04:10:48 choose_interface: Assigning IP address xxx.xxx.xxx.
xxx to interface eth0...
INFO  31/01/2005 04:10:48 cmd=/sbin/ifconfig eth0:1 inet xxx.xxx.xxx.xxx 
netmask 255.255.255.0 2>&1
INFO  31/01/2005 04:10:48 Running /sbin/cluster/tools/send_arp xxx.xxx.xxx.xxx 
xx:xx:xx:xx:xx:xx xxx.xxx.xxx.xxx FF:FF:FF:FF:FF:FF eth0 to send gratuitous 
arp request
INFO  31/01/2005 04:10:48 choose_interface: Successfully assigned IP address 
xxx.xxx.xxx.xxx to eth0:1
INFO  31/01/2005 04:10:48 choose_interface: Running IP level testing for 
interface eth0:1
INFO  31/01/2005 04:10:48 choose_interface: Test for IP xxx.xxx.xxx.xxx(tcp) 
was OK.
INFO  31/01/2005 04:10:48 choose_interface: IP level testing for interface 
eth0:1 succeeded
INFO  31/01/2005 04:10:48 Applications start completed successfully

but the output of the apache.start.log being:

/apache/admin/scripts/apachectl start: httpd (pid 12088) already running

so there's no listen on port 80 of the cluster IP (clusternode1:80). 

So you know also, if I start 2 by itself (without first starting 1), the 
listen on clusternode1:80 works.

So what I decided to do was a test, for the following bit:

    # check for pidfile
    if [ -f $PIDFILE ] ; then
        PID=`cat $PIDFILE`
        if [ "x$PID" != "x" ] && kill -0 $PID 2>/dev/null ; then
            STATUS="httpd (pid $PID) running"
            RUNNING=1

in the /apache/admin/scripts/apachectl file, I modified it to RUNNING=0 to see 
what would happen, basically forcing the startup of apache on clusternode1. I 
then halted the app and stopped the local httpd.

I then restarted the local httpd (which establishes a local:80) and then 
clstartapp of the apache app, which successfully loaded the clusternode1:80 as 
we can see:

tcp        0      0 clusternode1:80        0.0.0.0:*               LISTEN
tcp        0      0 local:80        0.0.0.0:*               LISTEN

and:

16683 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16686 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16687 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16688 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16689 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16690 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16691 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16692 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16693 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16839 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16840 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16841 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16842 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16843 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16844 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16845 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16846 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
16848 ?        S      0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf

as we can see modification produced the result I wanted.

I tested the websites and they all worked fine.

Halting the apache app also stopped just clusternode1:80 listen, again what I 
wanted.

I'm not sure what impact the modification above will make to the overall usage 
of linuxha.net, but it's the only way I was able to get this to work.

One other thing of note, when I ran:

# service httpd stop

while both local:80 and clusternode1:80 were up, the clusternode1:80 httpd 
processes were killed instead of local:80, not something I wanted. Also, while 
this was the case clstat showed the apache application was still running on 
clusternode1:

 Application       Node      State  Started  Monitor  Stale  Fail-over?
      apache      local    STARTED  0:00:00  Running      0         Yes

even though this isn't the case since we see:

16683 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16686 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16687 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16688 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16689 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16690 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16691 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16692 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local
16693 ?        S      0:00 /usr/sbin/httpd -d /etc/httpd_local

which is the local:80 instance of httpd and not clusternode1:80 - I suspect 
the lems monitor is only assessing whether httpd daemons are running, and not 
specifically whether the httpd running is local:80 or clusternode1:80.

I'd really like to get this working properly, so if you have any suggestions 
as to how I can achieve this I'd really appreciate it.

Michael.

> Hello Michael,
> 	Having two Server roots (i.e. /etc/httpd and /etc/httpd_local) is 
> the way to go I would guess. Initially copy all the files form one 
> to the other and then customise each by changing the Listen entry in 
> each to the local and clustered IP addresses as appropriate.
> 
> 	The -d option on httpd startup can then be using to specify the
> required server root (/etc/httpd or /etc/httpd_local) as required.
> 
> Regards,
> Simon.
> 
> On Fri, 2005-01-28 at 14:10 +1000, Michael Mansour wrote:
> > Hi Simon,
> > 
> > Just so you know, I have only one apache instance which runs as the 
"apache" 
> > application starts up. I didn't realise I could have had two.
> > 
> > The way I have it setup is to not have any apache server started on system 
> > boot. I then form the cluster and start the apache app, which reads the 
other 
> > conf files from /etc/httpd/conf.d/*conf and starts up the virtual servers.
> > 
> > I'm going to think about how I can now have two apache instances running, 
the 
> > local one and the clustered one. Do I need to have two sets of /etc/httpd/
> > conf.d directories in this case? one for the local and one for the 
clustered?
> > 
> > Thanks.
> > 
> > Michael.
> > 
> > > Hello Michael,
> > > 	From what you're saying when the use clhaltapp on node1 both Apache
> > > daemons stop? If that is the case you will need to modify the script
> > > that is in place for the" stopscript" command for the application in
> > > the /etc/cluster/apache/appconf.xml. First thing to do is to test it-
> > > with the apache application running on node1 as well as your local
> > > Apache, run whatever the "stopscript" is. It should only stop the
> > > clustered Apache instance. If no httpd processes are running you now
> > > know that this script needs to be modified in some way.
> > > 
> > > 	Possibly both use the same pid file (/var/run/httpd.pid)?
> > > 
> > > Regards,
> > > Simon
> > 
> > 
> >
------- End of Original Message -------




More information about the Linuxha-users mailing list