Lang Tips

How To Speed Up Some Programs


RedHat went to great lengths to make RedHat 9 very internationalized.  It seems that almost everything can be read in about 20 different languages.
While this is great for lots of people around the world, this does make a few things slow down quite a bit.

So let's look at how things look right out of the box.  If we do a
  echo $LANG
we see
  en_US.UTF-8

The en_US seems normal, that would be English: US, but what is that UTF-8.  I found a very good page on it at
the University of Cambridge but to sum it up, they say it is the way for using Unicode under Unix-style operating systems.

SETTING LANG THE VARIABLE

The LANG variable get's set when you first log in.  You can always change it in your startup scripts.  But if you want to change it globally you need to edit the file /etc/sysconfig/i18n
It will probrubly initially look like
LANG="en_US.UTF-8"
SUPPORTED="en_US.UTF-8:en_US:en"
SYSFONT="latarcyrheb-sun16"

You really only need to change the first line and change it to look like.
LANG="en_US"
SUPPORTED="en_US.UTF-8:en_US:en"
SYSFONT="latarcyrheb-sun16"

On a related note, you can change it so that the sorting style revents to the more traditional Unix style by adding
LC_COLLATE="C"
But this doesn't change the speed too much.

If you want to set the variable by hand.
In a Bash shell
  export LANG="en_US"
In a C shell
  set LANG en_US

EXAMPLES IN SPEED

So just how much does this change the speed.  Well, let's look.  The following table shows the same scripts being ran with the LANG variable set different.  This script downloads some large text files (about 100Meg each), concatonates them, greps through them quite a bit, then does some cutting and sorting.  Because of the download it is more usefull to look at the user and sys times.

LANG=en_US.UTF-8
LANG=en_US
LANG=C
real    9m22.368s
user    8m44.540s
sys     0m12.790s

real    1m54.703s
user    0m48.490s
sys     0m12.520s

real    1m10.904s
user    0m7.970s
sys     0m12.780s


Here we see that while setting LANG to C, gives a small increase over en_US, both of them are hands-down, the winner over en_US.UTF-8.