sccs_lockfile() working over NFS notes

[Wed Nov  9 18:01:44 PST 2005]

This is a brain dump to summarize the effort that has been put into making the sccs_lockfile() interface work over NFS as well as try and capture what I’ve learned in the process.

First, the test setup. I have something I call a string mapper (smapper) in /home/bk/lm/bitcluster/cmd/smapper on work. It’s somewhat similar to C Linda which was a distributed tuple space that never caught on. You can think of it as a network interface to an MDBM, or a network based registry. You can talk to it with telnet to run commands.

Our test setup uses only two of the commands, "setx" and "rmx". These interfaces map to mdbm_store(…, MDBM_INSERT) which means it will error if the entry already exists; and a specialized form of mdbm_delete where we pass both the key and the value and the delete succeeds if and only if both match.

This forms the basis for a network based lock manager. The invariants are that setx should always succeed if all accesses are serialized and rmx should always succeed as well.

The client side of the test setup, run with bk _locktest -n path/to/lock, has the following loop body:

sccs_lockfile();
setx();                 // sets the lock, should succeed
sleep a while so other people can try and get in
rmx();
sccs_unlockfile();

and we can run this across the unix part of the build cluster (I have a "unix" entry in my .clusters so you can do clogin unix) on an NFS file. I use /home/tmp_aix/LOCK.

Now the results and all the weirdnesses.

The test is testing things out of our control, i.e., the NFS client and server side. The server side we own, that’s work. The clients are whatever we have in our build cluster and the following have NFS implementations that don’t work:

redhat52 (Linux 2.0.36, a 1998 kernel)
hp (HPUX 10.20, a 1999 kernel)
sgi (IRIX 6.5, a 2001 kernel, shame on them)
redhat9 (Linux 2.4.20, probably redhat's fault)

In fairness to SGI, they do work better than the other two but putting them in the mix will eventually cause a failure.

All of those platforms are "don’t cares" except for redhat9. That one is probably OK because I think it is a kernel bug caused by load.

Here are the weirdnesses I saw during testing:

OpenBSD sometimes said that the link failed when it actually worked. Fix: ignore the link status, count on stat to tell us what we need.
I don’t remember which platform, but sometimes I saw that the link worked, the inode numbers were the same, but the link count was 1. Fix: loop on inodes being equal and linkcount being 1.
Some NFS implementations handle deletes by a rename to .nfs… and I’ve seen that turn into a linkcount of 3, the deleted, the uniq, and the lock. Fix: toss the unique file and retry.
It is quite common to fail one of the conditions even though we really won so we get into sccs_stalelock() and we are the owner of the lock. Fix: sleep a while and retry. That was easier than unraveling and deciding we won.
redhat9 would sometimes get the lock incorrectly, usually after hanging in the kernel for a while. Sometimes it hung forever.

But that’s about it. If I toss out the lame client side platforms I can run

( for i in 1 2 3 4 5 6 7 8 9 0
do ./bk _locktest -n /home/tmp_aix/LOCK 100 &
done
wait
echo All done
)

on all the other platforms over and over and it works. It’s sort of cool in that it generates a huge load on work:

load free cach swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt int ctx usr sys idl 5.75 0 0 0 0 0 0 0 0 0 55K 50K 0 0 1 123 76 5.75 0 0 0 0 0 0 0 0 0 55K 50K 0 0 1 120 79

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20696 root      15   0     0    0    0 S 23.2  0.0  27:07.96 nfsd
20691 root      15   0     0    0    0 S 22.9  0.0  27:30.91 nfsd
20695 root      16   0     0    0    0 R 22.9  0.0  26:20.44 nfsd
20693 root      16   0     0    0    0 R 22.9  0.0  25:08.86 nfsd
20689 root      15   0     0    0    0 S 22.5  0.0  26:18.06 nfsd
20694 root      16   0     0    0    0 R 21.9  0.0  25:46.82 nfsd
20690 root      16   0     0    0    0 R 21.9  0.0  26:45.03 nfsd
20692 root      16   0     0    0    0 R 21.2  0.0  26:48.83 nfsd