Application Note 0002 Draft 2.2 October 15, 2006 Why does ASEM-51 V1.3 hang under Linux? ======================================= asem 1.3 and hexbin 2.3 may hang when invoked - on Linux systems with kernel 2.6 thru 2.6.10, - on ext3 filesystems with dir_index enabled, - in subdirectories of large parent directories, - and with file parameters with relative paths. The reason is that the hashed b-tree implementation (htree) for the ext3 filesystem in Kernel 2.6 contained severe bugs which could not all be fixed before version 2.6.11. Therefore the system function readdir sometimes returned wrong results. For details see the kernel 2.6.11 changelog: ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11 ----------------------------------------------------------------------- [PATCH] ext3 htree telldir() fix telldir() is broken on large ext3 dir_index'd directories because getdents() gives d_off==0 for the first entry Here's a patch which fixes the problem, but note the following warning from the readdir man page: According to POSIX, the dirent structure contains a field char d_name[] of unspecified size, with at most NAME_MAX characters preceding the terminating null character. Use of other fields will harm the porta- bility of your programs. Also, as always, telldir() and seekdir() are truly awful interfaces because they implicitly assume that (a) a directory is a linear data structure, and (b) that the position in a directory can be expressed in a cookie which hsa only 31 bits on 32-bit systems. So there will be hash colliions that will cause programs that assume that seekdir(dirent->d_off) will always return the next directory entry to sometimes lose directory entries in the not-as-unlikely-as-we-would wish case of a 31-bit hash collision. Really, any program which is using telldir/seekdir really should be rewritten to not use these interfaces if at all possible. So with these caveats.... What we need to do is wire '.' and '..' to have hash values of (0,0) and (2,0), respectively, without ignoring other existing dirents with colliding hashes. (In those cases the programs will break, but they are statistically rare, and there's not much we can do in those cases anyway.) Signed-off-by: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ----------------------------------------------------------------------- To determine whether your system is concerned, and to get it right, perform the following steps: 1. Check your kernel version. Type at the shell prompt uname -a If your kernel is older than 2.6 or newer than 2.6.10 you have no problem. 2. If you have a 2.6.x kernel older than version 2.6.11, list your mounted filesystems with df -T If you are not working on ext3 filesystems, you have no problem. 3. If your home directory is on an ext3 filesystem, login as root and check whether its dir_index option is enabled, e.g. tune2fs -l /dev/hdxy | grep features If the filesystem features do not include the dir_index option, you have no problem. 4. If dir_index is enabled, change to your ASEM-51 working directory and invoke asem as follows: asem blink.a51 Be sure to try this in a directory whose parent directory is larger than 1 block (to ensure that it is really indexed)! Specify a file parameter with a relative path. If asem terminates (with or without errors) you have not yet a problem. 5. If it is running into an endless loop however, you have exactly the "readdir-problem" described above. In this case a quick workaround is to specify all file parameters for asem and hexbin with absolute paths, e.g. asem /home/me/8051/blink.a51 hexbin /home/me/8051/blink.hex This prevents asem and hexbin of calling readdir. 6. For effective troubleshooting, you can switch off directory indexing for the whole filesystem. First of all, ensure that it is properly unmounted. (If it happens to be your root filesystem, shutdown your computer and reboot from a live-CD.) Then type (as root) tune2fs -O ^dir_index /dev/hdxy e2fsck -Df /dev/hdxy It is said that running e2fsck is not absolutely necessary, but it is highly recommended to optimize the filesystem for traditional linear directories. 7. Of course the best solution is to update to kernel 2.6.11 or later, or to update the whole Linux distribution, e.g. to Mandriva Linux 10.2 (kernel 2.6.11) Fedora Core 4 (kernel 2.6.11) Ubuntu 5.10 (kernel 2.6.12) SuSE Linux 10.0 (kernel 2.6.13) ... or any later version. The latest stable Debian release 3.1 (Sarge) still comes with kernel 2.4.27 and doesn't cause any problems. If you are using another Linux distribution, please check the kernel version by yourself, and update if necessary. 8. If your asem or hexbin also hangs with other kernels or filesystems, or under different conditions, please report it as soon as possible! Thanks in advance. Thanks to Michael de Nil and Richard Stover for reporting this problem. The next ASEM-51 version (1.4, or something) will be compiled with the latest FreePascal release, whose runtime system does no longer call readdir, and the above problem will (hopefully) be solved. ---------------------------------------------------------------- Check out the ASEM-51 support website: http://plit.de/asem-51/