d-h-n.de
Blog about Oracle, Linux..

drbd.conf

October 31st, 2010 . by admin

Structur of file /etc/drbd.conf

global {}

common {}

resource res-0 {}
..
..
resource ress-n{}

where resource is:

resource name {
   on {}
   on {}
   startup {}
   syncer {}
   handlers {}
   net {}
   disk {}
   protocol {}
}

Each resource section needs:
- 2 on host sections
- May have a startup, syncer, handlers, net, disk.
- Required parameter in this section: protocol.

where:

on alf {
    device    /dev/drbd0;
    disk      /dev/hdc5;
    address   192.168.22.12:7788;
    meta-disk internal;
    #or meta-disk /dev/sdbx[idx];
}
startup {
    wfc-timeout
    degr-wfc-timeout
    wait-after-sb
    become-primary-on both
}
syncer {
    rate, after, al-extents
}
handlers {
   pri-on-incon-degr
   pri-lost-after-sb
   pri-lost
   outdate-peer
   local-io-error
   split-brain
}
net {
   sndbuf-size, timeout, connect-int, ping-int, ping-timeout, max-buffers,
   max-epoch-size, ko-count, allow-two-primaries, cram-hmac-alg, shared-secret,
   after-sb-0pri, after-sb-1pri, after-sb-2pri
}
disk {
   on-io-error, size,  fencing,  use-bmbv,
   no-disk-flushes, no-md-flushes
}
protocol { A | B |  C }

Parameters:
pri-on-incon-degr: primary on inconsistent data, degraded(cluster with only one node left)
pri-lost-after-sb: primary lost after split brain
wfc-timeout: wait for connection timout
degr-wfc-timeout: degraded wait for connection timout
rate: on 100Mbit ethernet, you cannot expect more than 12.5 MByte
after-sb-0pri: after split brain zero(no) node is primary
after-sb-1pri: after split brain one node is primary, other is secondary
after-sb-2pri: : after split brain two node are primary

for more see:
man drbd.conf
man drbdsetup
less /etc/drbd.conf


heartbeat – ERROR: Couldn’t unmount /mydata, giving up!

October 22nd, 2010 . by admin

On my test cluster with heartbeat, drbd, nfs when i try to awitch a node goto standby, also i run:

deb5:~# /usr/lib/heartbeat/hb_standby
2010/10/20_14:36:25 Going standby [all].

It work not! The node do a reboot!

The log file /var/log/ha-log show:

heartbeat[4710]: 2010/10/20_14:36:26 info: deb5 wants to go standby [all]
heartbeat[4710]: 2010/10/20_14:36:26 info: standby: deb6 can take our all resources
heartbeat[5813]: 2010/10/20_14:36:26 info: give up all HA resources (standby).
ResourceManager[5826]:  2010/10/20_14:36:26 info: Releasing resource group: deb5 IPaddr::192.168.37.11/24/eth0 drbddisk::res0 Filesystem::/dev/drbd0::/mydata::ext3 nfs-kernel-server
ResourceManager[5826]:  2010/10/20_14:36:26 info: Running /etc/init.d/nfs-kernel-server  stop
ResourceManager[5826]:  2010/10/20_14:36:26 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mydata ext3 stop
Filesystem[5900]:       2010/10/20_14:36:26 INFO: Running stop for /dev/drbd0 on /mydata
Filesystem[5900]:       2010/10/20_14:36:26 INFO: Trying to unmount /mydata
Filesystem[5900]:       2010/10/20_14:36:26 ERROR: Couldn't unmount /mydata;trying cleanup with SIGTERM
Filesystem[5900]:       2010/10/20_14:36:26 INFO: No processes on /mydata were signalled
Filesystem[5900]:       2010/10/20_14:36:27 ERROR: Couldn't unmount /mydata;trying cleanup with SIGTERM
Filesystem[5900]:       2010/10/20_14:36:27 INFO: No processes on /mydata were signalled
Filesystem[5900]:       2010/10/20_14:36:28 ERROR: Couldn't unmount /mydata;trying cleanup with SIGTERM
Filesystem[5900]:       2010/10/20_14:36:28 INFO: No processes on /mydata were signalled
Filesystem[5900]:       2010/10/20_14:36:29 ERROR: Couldn't unmount /mydata; trying cleanup with SIGKILL
Filesystem[5900]:       2010/10/20_14:36:29 INFO: No processes on /mydata were signalled
Filesystem[5900]:       2010/10/20_14:36:30 ERROR: Couldn't unmount /mydata;trying cleanup with SIGKILL
Filesystem[5900]:       2010/10/20_14:36:31 INFO: No processes on /mydata were signalled
Filesystem[5900]:       2010/10/20_14:36:32 ERROR: Couldn't unmount /mydata;trying cleanup with SIGKILL
Filesystem[5900]:       2010/10/20_14:36:32 INFO: No processes on /mydata were signalled
Filesystem[5900]:       2010/10/20_14:36:33 ERROR: Couldn't unmount /mydata, giving up!
Filesystem[5889]:       2010/10/20_14:36:33 ERROR:  Generic error

Problem: Also here heartbeat try to stop resource, it has problem when unmount ans stop daemon nfsd

After long search on the web i find this:
http://www.linux-ha.org/HaNFS
see: Hint 3

For my environment (Debian) i try this:

# vi /etc/init.d/nfs-kernel-server
 stop)
        log_daemon_msg "Stopping $DESC"

        log_progress_msg "mountd"
        start-stop-daemon --stop --oknodo --quiet \
            --name rpc.mountd --user 0
        if [ $? != 0 ]; then
                log_end_msg $?
                exit $?
        fi

        if [ "$NEED_SVCGSSD" = "yes" ]; then
                log_progress_msg "svcgssd"
                start-stop-daemon --stop --oknodo --quiet \
                    --name rpc.svcgssd --user 0
                if [ $? != 0 ]; then
                        log_end_msg $?
                        exit $?
                fi
        fi

        log_progress_msg "nfsd"
        start-stop-daemon --stop --oknodo --quiet \
            --name nfsd --user 0 --signal 9
        if [ $? != 0 ]; then
                log_end_msg $?
                exit $?
        fi

..and it works!
I can now switch a node(per hb_standby/hb_takeover) from active to passive and vice versa!


DRBD – /proc/drbd – How to read?

October 22nd, 2010 . by admin
hb1:~# cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:104 nr:0 dw:104 dr:157 al:5 bm:0 lo:0 pe:0 ua:0 ap:0
	resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
	act_log: used:0/257 hits:21 misses:5 starving:0 dirty:0 changed:5

cs: Connection State(WFConnection, StandAlone, Connected..)
st: (state prios 8.3; is equal ro:)
ro: Role of (local node /partner)
ds: Disk State(UpToDate, DUnknown..)

ns: Network Send
nr: Network Receive

more see: http://www.drbd.org/users-guide/ch-admin.html