[Linuxha-users] Caught in a loop
Simon Edwards
simon.edwards at linuxha.net
Mon Jan 31 20:52:11 GMT 2005
Hi Michael,
On the lems front simply change the
"<process_string>httpd</process_string>" line in the "httpd.xml" file to
tie it down to matching the correct processes, for example:
<process_string>httpd -f /apache/admin/conf/httpd.conf</process_string>
That should take care of that one.
As for the /apache/admin/scripts/apachectl script possibly worth
changing the PIDFILE setting rather than setting RUNNING to 0, ie;
PIDFILE=/var/run/httpd-cluster.pid
Hope this works for you! Let me know if not.
Regards,
Simon.
On Mon, 2005-01-31 at 14:50 +1000, Michael Mansour wrote:
> Hi Simon,
>
> After thinking about this alot and running through some of my own setups for
> it, I couldn't get it to work. What I tried was to have a separate /etc/httpd/
> conf.d.cluster directory together with the normal /etc/httpd/conf directory.
>
> So I decided to test your suggestion below, but found this didn't work either,
> with the problem being that the report shown in /var/log/cluster/apache.start.
> log is:
>
> /apache/admin/scripts/apachectl start: httpd (pid 9605) already running
>
> so it seemed no attempt at running another port 80 listener was made.
>
> I'll explain this test setup for you. I set this up as:
>
> 1. /etc/httpd_local be the ServerRoot for the local httpd (local:80)
>
> 2. /etc/httpd be the ServerRoot for the cluster (clusternode1:80)
>
> For 1, it starts up fine using the normal OS start scripts "service httpd
> start":
>
> 9605 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9608 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9609 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9610 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9611 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9612 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9613 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 9614 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
>
> and:
>
> tcp 0 0 local:80 0.0.0.0:* LISTEN
>
> But when I then execute 2, the apache app starts up fine with "clstartapp":
>
> [root at node1 conf]# clstartapp -A apache -V
> INFO 31/01/2005 04:10:43 Validated checksum for cluster configuration
> INFO 31/01/2005 04:10:43 Checked that node names resolve to IP addresses
> INFO 31/01/2005 04:10:43 Validated Build run has completed against this
> configuration.
> INFO 31/01/2005 04:10:43 drbd kernel module loaded already on xxx.xxx.xxx.xxx
> INFO 31/01/2005 04:10:43 Checking heartbeats for any sign of life...
> WARN 31/01/2005 04:10:43 Attemping ICMP ping of xxx.xxx.xxx.xxx...
> INFO 31/01/2005 04:10:44 drbd kernel module loaded already on xxxxxxxxx
> INFO 31/01/2005 04:10:44 Local DRBD devices started successfully.
> INFO 31/01/2005 04:10:44 DRBD: Skipping ENBD decisioning and relying on meta
> data...
> INFO 31/01/2005 04:10:44 Attempting to start DRBD services on xxxxxxxxx
> INFO 31/01/2005 04:10:45 DRBD devices started successfully on xxxxxxxxx
> INFO 31/01/2005 04:10:45 Validated consistency of available data for DRBD.
> INFO 31/01/2005 04:10:45 Both data copies believed good.
> WARN 31/01/2005 04:10:45 Locking services not available.
> INFO 31/01/2005 04:10:45 Attempting to register application apache as
> starting...
> INFO 31/01/2005 04:10:46 Application registered successfully as starting.
> INFO 31/01/2005 04:10:46 Checking IP address for application is not in use...
> INFO 31/01/2005 04:10:48 Required application IP address is not pingable -
> continuing.
> INFO 31/01/2005 04:10:48 Attempting to make local DRBD devices primary...
> INFO 31/01/2005 04:10:48 All local DRBD now primary.
> INFO 31/01/2005 04:10:48 Running "/sbin/fsck -t ext3 -a /dev/drbd0"...
> INFO 31/01/2005 04:10:48 Running "PATH=$PATH:/sbin:/bin:/usr/sbin; mount -t
> ext3 -o rw /dev/drbd0 /apache"...
> INFO 31/01/2005 04:10:48 File systems mounted on DRBD devices.
> INFO 31/01/2005 04:10:48 choose_interface: Link beat ok on interface eth0
> INFO 31/01/2005 04:10:48 choose_interface: Assigning IP address xxx.xxx.xxx.
> xxx to interface eth0...
> INFO 31/01/2005 04:10:48 cmd=/sbin/ifconfig eth0:1 inet xxx.xxx.xxx.xxx
> netmask 255.255.255.0 2>&1
> INFO 31/01/2005 04:10:48 Running /sbin/cluster/tools/send_arp xxx.xxx.xxx.xxx
> xx:xx:xx:xx:xx:xx xxx.xxx.xxx.xxx FF:FF:FF:FF:FF:FF eth0 to send gratuitous
> arp request
> INFO 31/01/2005 04:10:48 choose_interface: Successfully assigned IP address
> xxx.xxx.xxx.xxx to eth0:1
> INFO 31/01/2005 04:10:48 choose_interface: Running IP level testing for
> interface eth0:1
> INFO 31/01/2005 04:10:48 choose_interface: Test for IP xxx.xxx.xxx.xxx(tcp)
> was OK.
> INFO 31/01/2005 04:10:48 choose_interface: IP level testing for interface
> eth0:1 succeeded
> INFO 31/01/2005 04:10:48 Applications start completed successfully
>
> but the output of the apache.start.log being:
>
> /apache/admin/scripts/apachectl start: httpd (pid 12088) already running
>
> so there's no listen on port 80 of the cluster IP (clusternode1:80).
>
> So you know also, if I start 2 by itself (without first starting 1), the
> listen on clusternode1:80 works.
>
> So what I decided to do was a test, for the following bit:
>
> # check for pidfile
> if [ -f $PIDFILE ] ; then
> PID=`cat $PIDFILE`
> if [ "x$PID" != "x" ] && kill -0 $PID 2>/dev/null ; then
> STATUS="httpd (pid $PID) running"
> RUNNING=1
>
> in the /apache/admin/scripts/apachectl file, I modified it to RUNNING=0 to see
> what would happen, basically forcing the startup of apache on clusternode1. I
> then halted the app and stopped the local httpd.
>
> I then restarted the local httpd (which establishes a local:80) and then
> clstartapp of the apache app, which successfully loaded the clusternode1:80 as
> we can see:
>
> tcp 0 0 clusternode1:80 0.0.0.0:* LISTEN
> tcp 0 0 local:80 0.0.0.0:* LISTEN
>
> and:
>
> 16683 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16686 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16687 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16688 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16689 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16690 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16691 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16692 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16693 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16839 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16840 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16841 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16842 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16843 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16844 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16845 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16846 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
> 16848 ? S 0:00 /usr/sbin/httpd -f /apache/admin/conf/httpd.conf
>
> as we can see modification produced the result I wanted.
>
> I tested the websites and they all worked fine.
>
> Halting the apache app also stopped just clusternode1:80 listen, again what I
> wanted.
>
> I'm not sure what impact the modification above will make to the overall usage
> of linuxha.net, but it's the only way I was able to get this to work.
>
> One other thing of note, when I ran:
>
> # service httpd stop
>
> while both local:80 and clusternode1:80 were up, the clusternode1:80 httpd
> processes were killed instead of local:80, not something I wanted. Also, while
> this was the case clstat showed the apache application was still running on
> clusternode1:
>
> Application Node State Started Monitor Stale Fail-over?
> apache local STARTED 0:00:00 Running 0 Yes
>
> even though this isn't the case since we see:
>
> 16683 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16686 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16687 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16688 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16689 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16690 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16691 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16692 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
> 16693 ? S 0:00 /usr/sbin/httpd -d /etc/httpd_local
>
> which is the local:80 instance of httpd and not clusternode1:80 - I suspect
> the lems monitor is only assessing whether httpd daemons are running, and not
> specifically whether the httpd running is local:80 or clusternode1:80.
>
> I'd really like to get this working properly, so if you have any suggestions
> as to how I can achieve this I'd really appreciate it.
>
> Michael.
>
> > Hello Michael,
> > Having two Server roots (i.e. /etc/httpd and /etc/httpd_local) is
> > the way to go I would guess. Initially copy all the files form one
> > to the other and then customise each by changing the Listen entry in
> > each to the local and clustered IP addresses as appropriate.
> >
> > The -d option on httpd startup can then be using to specify the
> > required server root (/etc/httpd or /etc/httpd_local) as required.
> >
> > Regards,
> > Simon.
> >
> > On Fri, 2005-01-28 at 14:10 +1000, Michael Mansour wrote:
> > > Hi Simon,
> > >
> > > Just so you know, I have only one apache instance which runs as the
> "apache"
> > > application starts up. I didn't realise I could have had two.
> > >
> > > The way I have it setup is to not have any apache server started on system
> > > boot. I then form the cluster and start the apache app, which reads the
> other
> > > conf files from /etc/httpd/conf.d/*conf and starts up the virtual servers.
> > >
> > > I'm going to think about how I can now have two apache instances running,
> the
> > > local one and the clustered one. Do I need to have two sets of /etc/httpd/
> > > conf.d directories in this case? one for the local and one for the
> clustered?
> > >
> > > Thanks.
> > >
> > > Michael.
> > >
> > > > Hello Michael,
> > > > From what you're saying when the use clhaltapp on node1 both Apache
> > > > daemons stop? If that is the case you will need to modify the script
> > > > that is in place for the" stopscript" command for the application in
> > > > the /etc/cluster/apache/appconf.xml. First thing to do is to test it-
> > > > with the apache application running on node1 as well as your local
> > > > Apache, run whatever the "stopscript" is. It should only stop the
> > > > clustered Apache instance. If no httpd processes are running you now
> > > > know that this script needs to be modified in some way.
> > > >
> > > > Possibly both use the same pid file (/var/run/httpd.pid)?
> > > >
> > > > Regards,
> > > > Simon
> > >
> > >
> > >
> ------- End of Original Message -------
>
>
>
More information about the Linuxha-users
mailing list