Richard P
2015-01-22 19:04:59 UTC
There appears to be a race condition in amd where if an unmount is in progress
and then a request comes in to mount the same fs (via another map entry) the
second mount can end up referencing an unmounted fs. You reproduce this with
df /mountpoint1; sleep 300; df /mountpoint2 - you should see the second df
displaying the root fs (might take a couple of goes.)
I have a naiive patch below that resolves this:
--- amd-6.1.5.orig/amd/autil.c 2009-06-04 11:06:44.000000000 -0400
+++ amd-6.1.5/amd/autil.c 2015-01-22 01:45:56.000000000 -0500
@@ -688,6 +688,11 @@
}
/*
+ * Clear the mounted flag in case there is a pending mount for the same fs
+ */
+ mf->mf_flags &= ~MFF_MOUNTED;
+
+ /*
* If this is a pseudo-directory then adjust the link count
* in the parent
*/
Also, while investigating this I found that amd does an unnecessary fork when
mounting a node that refers to an already-mounted fs. This is the proposed
patch (without re-indentation):
--- amd-6.1.5.orig/amd/amfs_generic.c 2009-08-11 05:57:18.000000000 -0400
+++ amd-6.1.5/amd/amfs_generic.c 2015-01-22 03:02:16.000000000 -0500
@@ -844,6 +844,11 @@
#endif /* HAVE_FS_AUTOFS */
already_mounted:
+#ifdef HAVE_FS_AUTOFS
+ if (mp->am_flags & AMF_AUTOFS || !(mf->mf_flags & MFF_MOUNTED)) {
+#else
+ if (!(mf->mf_flags & MFF_MOUNTED)) {
+#endif /* HAVE_FS_AUTOFS */
mf->mf_flags |= MFF_MOUNTING;
if (mf->mf_fsflags & FS_MBACKGROUND) {
dlog("backgrounding mount of \"%s\"", mf->mf_mount);
@@ -861,6 +866,7 @@
}
mf->mf_flags &= ~MFF_MOUNTING;
+ }
if (this_error > 0)
goto failed;
if (this_error == 0) {
--- amd-6.1.5.orig/amd/map.c 2009-08-11 05:57:18.000000000 -0400
+++ amd-6.1.5/amd/map.c 2015-01-22 02:39:03.000000000 -0500
@@ -765,8 +765,7 @@
error = autofs_mount_fs(mp, mf);
else
#endif /* HAVE_FS_AUTOFS */
- if (!(mf->mf_flags & MFF_MOUNTED))
- error = mf->mf_ops->mount_fs(mp, mf);
+ error = mf->mf_ops->mount_fs(mp, mf);
if (error > 0)
dlog("mount_node: call to mf_ops->mount_fs(%s) failed: %s",
Not sure if these patches are correct but we'd definitely like to see
a fix for the race condition. I've included some (bowdlerized) logs below
illustrating the problem:
Jan 20 08:16:33 client amd[4623]: Timing out automount points...
Jan 20 08:16:33 client amd[4623]: "/mountpoint1" on /a/server/share timed out
Jan 20 08:16:33 client amd[4623]: Trying unmount in background
Jan 20 08:16:33 client amd[4623]: forked process 33532
Jan 20 08:16:33 client amd[33532]: backgrounded
Jan 20 08:16:33 client amd[33532]: backgrounded
Jan 20 08:16:33 client amd[4623]: Next mount timeout in 120s
Jan 20 08:16:33 client amd[33532]: Unmounting </mountpoint1>
</a/server/share> (server:/share) flags 15
Jan 20 08:16:33 client amd[33532]: attempting nfs umount
Jan 20 08:16:33 client amd[4623]: Discarding filehandle for server:/share
Jan 20 08:16:33 client amd[4623]: Select waits for 120s
Jan 20 08:16:33 client amd[33532]: Trying unmount(/a/server/share)
Jan 20 08:16:33 client amd[33532]: unlock_mntlist: releasing
-%-
Jan 20 08:16:33 client amd[4623]: NFS: mounting remote server
"server", remote fs "/share" on "/a/server/share"
Jan 20 08:16:33 client amd[4623]: Locating mntfs reference to
(/a/server/share,server:/share)
Jan 20 08:16:33 client amd[4623]: mf->mf_flags = 0x115
Jan 20 08:16:33 client amd[4623]: Got a hit with nfs
Jan 20 08:16:33 client amd[4623]: free_mntfs <//nil//> type error
mf_refc 1 flags 0
Jan 20 08:16:33 client amd[4623]: Immediately discarding mntfs for //nil//
Jan 20 08:16:33 client amd[4623]: in amfs_generic_mount_child
Jan 20 08:16:33 client amd[4623]: free_loc 0x2999f50
Jan 20 08:16:33 client amd[4623]: free_mntfs <//nil//> type error
mf_refc 1 flags 0
Jan 20 08:16:33 client amd[4623]: Immediately discarding mntfs for //nil//
Jan 20 08:16:33 client amd[4623]: mount of "server:/share" already pending
Jan 20 08:16:33 client amd[4623]: will retry ...
Jan 20 08:16:33 client amd[4623]: Arranging to retry mount of /mountpoint2
Jan 20 08:16:33 client amd[4623]: SLEEP on 0x36f7230
Jan 20 08:16:33 client amd[4623]: Saving RPC transport for /mountpoint2
Jan 20 08:16:33 client amd[4623]: Timing out automount points...
Jan 20 08:16:33 client amd[4623]: Next mount timeout in 217s
Jan 20 08:16:33 client amd[4623]: Select waits for 2s
Jan 20 08:16:33 client amd[33532]: unmount(/a/server/share) succeeded
Jan 20 08:16:33 client amd[33532]: Finished unmount(/a/server/share)
Jan 20 08:16:33 client amd[33532]: unlock_mntlist: releasing
Jan 20 08:16:33 client amd[4623]: Process 33532 exited with status 0
Jan 20 08:16:33 client amd[4623]: select interrupted
Jan 20 08:16:33 client amd[4623]: free_loc 0x203c2b0
Jan 20 08:16:33 client amd[4623]: free_mntfs </a/server/share> type
nfs mf_refc 3 flags 11
Jan 20 08:16:33 client amd[4623]: Commencing retry for mount of /mountpoint2
Jan 20 08:16:33 client amd[4623]: free_loc 0x2658380
Jan 20 08:16:33 client amd[4623]: free_mntfs </a/server/share> type
nfs mf_refc 2 flags 11
Jan 20 08:16:33 client amd[4623]: duplicate mount of "server:/share" ...
Jan 20 08:16:33 client amd[4623]: backgrounding mount of "/a/server/share"
Jan 20 08:16:33 client amd[4623]: forked process 33534
Jan 20 08:16:33 client amd[33534]: backgrounded
Jan 20 08:16:33 client amd[4623]: Timing out automount points...
Jan 20 08:16:33 client amd[4623]: Next mount timeout in 217s
Jan 20 08:16:33 client amd[4623]: Select waits for 217s
Jan 20 08:16:33 client amd[4623]: Process 33534 exited with status 0
Jan 20 08:16:33 client amd[4623]: select interrupted
Jan 20 08:16:33 client amd[4623]: amfs_cont: '/mountpoint2'
Jan 20 08:16:33 client amd[4623]: Mounting /mountpoint2 returned success
Jan 20 08:16:33 client amd[4623]: server:/share referenced fstype nfs
on /a/server/share
Jan 20 08:16:33 client amd[4623]: /mountpoint2 set to timeout in 300 seconds
Jan 20 08:16:33 client amd[4623]: Quick reply sent for /a/server/share
Jan 20 08:16:33 client amd[4623]: free_continuation
Jan 20 08:16:33 client amd[4623]: free_loc 0x2658380
Jan 20 08:16:33 client amd[4623]: free_mntfs </a/server/share> type
nfs mf_refc 2 flags 11
and then a request comes in to mount the same fs (via another map entry) the
second mount can end up referencing an unmounted fs. You reproduce this with
df /mountpoint1; sleep 300; df /mountpoint2 - you should see the second df
displaying the root fs (might take a couple of goes.)
I have a naiive patch below that resolves this:
--- amd-6.1.5.orig/amd/autil.c 2009-06-04 11:06:44.000000000 -0400
+++ amd-6.1.5/amd/autil.c 2015-01-22 01:45:56.000000000 -0500
@@ -688,6 +688,11 @@
}
/*
+ * Clear the mounted flag in case there is a pending mount for the same fs
+ */
+ mf->mf_flags &= ~MFF_MOUNTED;
+
+ /*
* If this is a pseudo-directory then adjust the link count
* in the parent
*/
Also, while investigating this I found that amd does an unnecessary fork when
mounting a node that refers to an already-mounted fs. This is the proposed
patch (without re-indentation):
--- amd-6.1.5.orig/amd/amfs_generic.c 2009-08-11 05:57:18.000000000 -0400
+++ amd-6.1.5/amd/amfs_generic.c 2015-01-22 03:02:16.000000000 -0500
@@ -844,6 +844,11 @@
#endif /* HAVE_FS_AUTOFS */
already_mounted:
+#ifdef HAVE_FS_AUTOFS
+ if (mp->am_flags & AMF_AUTOFS || !(mf->mf_flags & MFF_MOUNTED)) {
+#else
+ if (!(mf->mf_flags & MFF_MOUNTED)) {
+#endif /* HAVE_FS_AUTOFS */
mf->mf_flags |= MFF_MOUNTING;
if (mf->mf_fsflags & FS_MBACKGROUND) {
dlog("backgrounding mount of \"%s\"", mf->mf_mount);
@@ -861,6 +866,7 @@
}
mf->mf_flags &= ~MFF_MOUNTING;
+ }
if (this_error > 0)
goto failed;
if (this_error == 0) {
--- amd-6.1.5.orig/amd/map.c 2009-08-11 05:57:18.000000000 -0400
+++ amd-6.1.5/amd/map.c 2015-01-22 02:39:03.000000000 -0500
@@ -765,8 +765,7 @@
error = autofs_mount_fs(mp, mf);
else
#endif /* HAVE_FS_AUTOFS */
- if (!(mf->mf_flags & MFF_MOUNTED))
- error = mf->mf_ops->mount_fs(mp, mf);
+ error = mf->mf_ops->mount_fs(mp, mf);
if (error > 0)
dlog("mount_node: call to mf_ops->mount_fs(%s) failed: %s",
Not sure if these patches are correct but we'd definitely like to see
a fix for the race condition. I've included some (bowdlerized) logs below
illustrating the problem:
Jan 20 08:16:33 client amd[4623]: Timing out automount points...
Jan 20 08:16:33 client amd[4623]: "/mountpoint1" on /a/server/share timed out
Jan 20 08:16:33 client amd[4623]: Trying unmount in background
Jan 20 08:16:33 client amd[4623]: forked process 33532
Jan 20 08:16:33 client amd[33532]: backgrounded
Jan 20 08:16:33 client amd[33532]: backgrounded
Jan 20 08:16:33 client amd[4623]: Next mount timeout in 120s
Jan 20 08:16:33 client amd[33532]: Unmounting </mountpoint1>
</a/server/share> (server:/share) flags 15
Jan 20 08:16:33 client amd[33532]: attempting nfs umount
Jan 20 08:16:33 client amd[4623]: Discarding filehandle for server:/share
Jan 20 08:16:33 client amd[4623]: Select waits for 120s
Jan 20 08:16:33 client amd[33532]: Trying unmount(/a/server/share)
Jan 20 08:16:33 client amd[33532]: unlock_mntlist: releasing
-%-
Jan 20 08:16:33 client amd[4623]: NFS: mounting remote server
"server", remote fs "/share" on "/a/server/share"
Jan 20 08:16:33 client amd[4623]: Locating mntfs reference to
(/a/server/share,server:/share)
Jan 20 08:16:33 client amd[4623]: mf->mf_flags = 0x115
Jan 20 08:16:33 client amd[4623]: Got a hit with nfs
Jan 20 08:16:33 client amd[4623]: free_mntfs <//nil//> type error
mf_refc 1 flags 0
Jan 20 08:16:33 client amd[4623]: Immediately discarding mntfs for //nil//
Jan 20 08:16:33 client amd[4623]: in amfs_generic_mount_child
Jan 20 08:16:33 client amd[4623]: free_loc 0x2999f50
Jan 20 08:16:33 client amd[4623]: free_mntfs <//nil//> type error
mf_refc 1 flags 0
Jan 20 08:16:33 client amd[4623]: Immediately discarding mntfs for //nil//
Jan 20 08:16:33 client amd[4623]: mount of "server:/share" already pending
Jan 20 08:16:33 client amd[4623]: will retry ...
Jan 20 08:16:33 client amd[4623]: Arranging to retry mount of /mountpoint2
Jan 20 08:16:33 client amd[4623]: SLEEP on 0x36f7230
Jan 20 08:16:33 client amd[4623]: Saving RPC transport for /mountpoint2
Jan 20 08:16:33 client amd[4623]: Timing out automount points...
Jan 20 08:16:33 client amd[4623]: Next mount timeout in 217s
Jan 20 08:16:33 client amd[4623]: Select waits for 2s
Jan 20 08:16:33 client amd[33532]: unmount(/a/server/share) succeeded
Jan 20 08:16:33 client amd[33532]: Finished unmount(/a/server/share)
Jan 20 08:16:33 client amd[33532]: unlock_mntlist: releasing
Jan 20 08:16:33 client amd[4623]: Process 33532 exited with status 0
Jan 20 08:16:33 client amd[4623]: select interrupted
Jan 20 08:16:33 client amd[4623]: free_loc 0x203c2b0
Jan 20 08:16:33 client amd[4623]: free_mntfs </a/server/share> type
nfs mf_refc 3 flags 11
Jan 20 08:16:33 client amd[4623]: Commencing retry for mount of /mountpoint2
Jan 20 08:16:33 client amd[4623]: free_loc 0x2658380
Jan 20 08:16:33 client amd[4623]: free_mntfs </a/server/share> type
nfs mf_refc 2 flags 11
Jan 20 08:16:33 client amd[4623]: duplicate mount of "server:/share" ...
Jan 20 08:16:33 client amd[4623]: backgrounding mount of "/a/server/share"
Jan 20 08:16:33 client amd[4623]: forked process 33534
Jan 20 08:16:33 client amd[33534]: backgrounded
Jan 20 08:16:33 client amd[4623]: Timing out automount points...
Jan 20 08:16:33 client amd[4623]: Next mount timeout in 217s
Jan 20 08:16:33 client amd[4623]: Select waits for 217s
Jan 20 08:16:33 client amd[4623]: Process 33534 exited with status 0
Jan 20 08:16:33 client amd[4623]: select interrupted
Jan 20 08:16:33 client amd[4623]: amfs_cont: '/mountpoint2'
Jan 20 08:16:33 client amd[4623]: Mounting /mountpoint2 returned success
Jan 20 08:16:33 client amd[4623]: server:/share referenced fstype nfs
on /a/server/share
Jan 20 08:16:33 client amd[4623]: /mountpoint2 set to timeout in 300 seconds
Jan 20 08:16:33 client amd[4623]: Quick reply sent for /a/server/share
Jan 20 08:16:33 client amd[4623]: free_continuation
Jan 20 08:16:33 client amd[4623]: free_loc 0x2658380
Jan 20 08:16:33 client amd[4623]: free_mntfs </a/server/share> type
nfs mf_refc 2 flags 11