Saturday, January 24, 2009

Why linux filesystems (ext2, ext3) get slow when >1K files in a directory

I know that, but don't know in detail.
(this article is English version of my Japanese blog)

Especially, ls command is very slow, so I thought readdir system call is too slow. But, I was WRONG.
Overhead is in ls command's algorithm.
Mailing List article in linux-users ( http://his.luky.org/ML/linux-users.6/msg08919.html ) taught me that.
ls command collects information not only the list of filenames but also each file's attributes (size, permission.. etc.).
So, ls command checks attributes for each files. That takes long time.

In my application, only the list of filenames is required. So, readdir system call just works fine.
Here is the sample code (almost the same as manpage of readdir!)



#include <dirent.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
DIR *dirp;
struct dirent *dp;

if (argc != 2 ) {
printf("coundn't open dir\n");
return;
}


if ( (dirp = opendir(argv[1]) ) == NULL) {
printf("coundn't open dir\n");
return;
}

do {
errno = 0;
if ( (dp = readdir(dirp) ) != NULL) {
(void) printf("%s\n", dp->d_name);
}
} while (dp != NULL);

if (errno != 0)
perror("error reading directory ");

(void) closedir(dirp);
return(0);
}

Now, I checked the performance of readdir itself.
In the case of over 10K files in a directory, it takes 300msec (Kernel 2.6.9-42.ELsmp: Cent OS4.4 4800 bogomips)
Here is the result of 'strace -c'


# strace -c readdir . > /dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
80.44 0.442220 14 30821 getdents64
19.48 0.107076 8 13018 write
0.03 0.000149 149 1 execve
0.01 0.000054 11 5 old_mmap
0.01 0.000038 13 3 open
0.01 0.000035 35 1 read
0.01 0.000031 8 4 fstat64
0.01 0.000030 8 4 brk
0.00 0.000022 11 2 mprotect
0.00 0.000014 5 3 close
0.00 0.000013 13 1 munmap
0.00 0.000011 11 1 1 access
0.00 0.000008 8 1 mmap2
0.00 0.000008 8 1 fcntl64
0.00 0.000007 7 1 1 ioctl
0.00 0.000007 7 1 uname
0.00 0.000003 3 1 set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00 0.549726 43869 2 total

The most time-consuming system-call is getdents64. System(Disk) cache speed up the systemcall.
If cache is full-hit, getdents64 takes only 10usecs for 1M files in a directory.
If you tried this on NFS-mounted directory, cache-effect maybe small.

Wednesday, January 21, 2009

Fixing Makefile to install Erlang R12B-5(R12B-4) on CentOS5.2

I've noticed incompleteness in the Makefile of Erlang source-tarball package in R12B-4 version.
To build Erlang system from the source files, under CentOS 5 environment, additional library option is required.

for details, you can see Peter Lemenkov's post to erlang-questions ML.
http://www.erlang.org/pipermail/erlang-questions/2008-August/037237.html

or, if you are can read Japanese, refer my blog in Japanese.
http://d.hatena.ne.jp/kgbu/20080909/1220984877

Here is the patch for the "lib/ssl/c_src/Makefile.in".
http://cvs.fedoraproject.org/viewvc/rpms/erlang/EL-5/otp-ssl_missing_libs.patch?view=auto&revision=1.1

Still the bug exists in R12B-5 version of Erlang (latest as of today).

Saturday, January 17, 2009

Checkpoints to Install rdiff-backup

Checkpoints to install rdiff-backup (in Linux)

in case you are stuck in ...

1) check python-devel package installation
2) check librsync library dependency
3) check librsync library compile option. You may need to re-comile it with -fPIC option

see. http://wiki.rdiff-backup.org/wiki/index.php/RdiffBackupWiki

below, summay of operations (in my environment: Fedora Core 5, x86_64)
(this post is English version of my original blog in Japanese )
# rpm -aq | grep python-devel
# yum install python-devel

# rpm -aq | grep librsync
# wget http://downloads.sourceforge.net/librsync/librsync-0.9.7.tar.gz?modtime=1097439809&big_mirror=0
# tar zxf librsync-0.9.7.tar.gz
# cd librsync-0.9.7
# ./configure
# make AM_CFLAGS=-fPIC
# make install
# ldconfig
# cd ..

# wget http://savannah.nongnu.org/download/rdiff-backup/rdiff-backup-1.2.5.tar.gz
# tar zxf rdiff-backup-1.2.5.tar.gz
# cd rdiff-backup-1.2.5
# python setup.py install

Wednesday, January 7, 2009

first post

This is my first post.


Here, I'll make some memoises/frakes of monky-typing codes.
So, please look over them if you have much spare time.