Comparing windows paths to debian paths
I've finally figured out how the POSIX directory called "bin" works. I've wondered about how to tell your operating system where program files are stored ever since I heard, circa 1992, that it was inefficient to keep adding stuff to your Windows PATH variable whenever you installed a new program.
In Windows, when you launch a program, you usually click on a shortcut, or double click on a file, which in turn runs the command that starts your program. This command is called something like "word.exe", for example. The question is: if word.exe is installed in C:\Program Files\SuperWord\SuperWord Talker 8.0\, how does Windows know where to find word.exe?
It's actually simple. When you installed SuperWord, the installer added the path "C:\Program Files\SuperWord\SuperWord Talker 8.0\" to your PATH variable. In fact, every time you installed a program, the installer added its path to the PATH variable. In good time, your PATH variable looks something like:
C:\Program Files\Windows Resource Kits\Tools\;%SystemRoot%\system32; %SystemRoot%; C:\Program Files\Microsoft SQL Server\80\Tools\Binn\;C:\kits\j2sdk1.4.2_10\jre\bin;C:\Program Files\Perforce;C:\Program Files\Common Files\GTK\2.0\bin;C:\bin;C:\Program Files\CVSNT\;%ANT_HOME%\bin;C:\Program Files\My Ass\Poo\;C:\Program Files\Mr. Poopy Pants\7.6\;C:\Program Files\Star Wards\;C:\Program Files\Crazy Crumbs Pick'm Up\;C:\Program Files\Lemmings 20\;C:\Program Files\Apache Web Server for Kids\;C:\Program Files\Microsoft\Internet\Explorer\4\;C:\Program Files\Netscape Navigator\1.0\;C:\Program Files\Flock\;C:\Program Files\Clock\;C:\Program Files\Block\;C:\Program Files\Truck\;C:\Program Files\Crock\;C:\Program Files\SuperWord\SuperWord Talker 8.0\;
Et cetera, et cetera.
Then, every single time you launch any program whatsoever, Windows always looks in every single directory listed in your PATH variable until it finds the program whose command you just launched.
So, in 1992, I heard that this was "inefficient". It strikes me as fairly intuitive why this way of searching for the location of a program could be ienfficient: you're looking through a lot of irrelevant directories. Why not just keep the directory name with the program itself?
Since that time, whenever I've gone on a cleaning binge, I've tried to remove entries from my PATH variable and find other ways to store the location of each command in a way that my operating system could read it. One of these techniques was simply to add the path to the command in every dialog box that asked for it. For example, when I change the default text editor that my browser launches, I enter the entire path, not just the name of the command. Another thing I like to do in Windows is to associate a different editor with any files that are in text format. In this case, I go through the Windows Reigistry and change all the paths that say "notepad.exe" to the full path to my preferred editor. I would do likewise with my web browser. This keeps my PATH variable short. However, it does mean entering a path in a lot of different places.
I had found it intriguing for quite some time, then, that in a POSIX-style system, all of the commands that run programs are stored in the directory called "/bin". How, I wondered, does the system manage the fact that every single program is in the same directory? Windows doesn't do this: Windows might put every program's directory in the big Program Files directory, but Program Files has a subdirectory for every program. Not in Debian: there, you have /bin, and all that's in /bin are files, one file per program. So the question floating at the back of my mind since I discovered the POSIX directory layout was, what happens when these programs in /bin need access to other files? How do they know where to search for them? What if they need libraries or configuration files? How do they point to all these things? If all you have is a program called /bin/toot, how does toot know where to find tootlib, tweet, boing or pow?
The answer is of course as simple as the Windows solution: the things in /bin are not programs at all, but just the commands that actually run the programs. If you open up /bin/toot in a text editor, you might find inside a line that actually runs the real program, and that specifies the location of that program, for example, a line like: "/usr/share/system/Program\ Files/My\ Program\ Files/Toot/8.0/toot.exe --volume 80 --nologging".
The advantage is, now when you run a program, the operating system only ever has to look in /bin to find the command that runs that program. That means that, if searching through a bunch of directories takes time, then you've saved time!
I don't know, however, if searching through all the directories listed in the PATH variable really takes all that much time. Computers have improved since 1992 after all. But my friend Arnold tells me that disk access is an important bottleneck (see Getting *ed by windows explorer) in computer speed, especially when mutli-gigabyte disks start getting full. It would seem to me, intutively, that reducing the number of directories that an operating system has to search to find a program would speed up the time it takes to launch that program, wouldn't it?
I could go through an analysis of the process of looking up a directory name in a filesystem table, then following that entry to the actual location of the directory on a disk, but I don't have the time right now. I'm pretty sure that no matter how you optimised your filesystem table, you still get better performance by not searching through a bunch of irrelevant directories.
All this said, this technique is by no means limited to Linux. You can make a directory called C:\bin on Windows, and you can put text files in this directory, each of which contain one line that actually runs the program. For notepad, you could make a file called C:\bin\notepad, and put a line in it like "C:\Windows\notepad.exe". Or you could make a file called C:\bin\firefox, and you could put in the line "C:\Program Files\Mozilla Firefox\firefox.exe". Then, you could remove the entry "C:\Program Files\Mozilla Firefox" from your PATH variable. Now, all you need to do when you want to run firefox, whether you are in Excel's Preferences window or just in the Windows Run dialog, is type "firefox". Of course, don't forget to add "C:\bin" to your PATH variable, since you still need to tell Windows that it has to always search C:\bin whenever it runs a program.
This is still a bit ugly on Windows, because it means that, now, instead of just Firefox nicely launching in a Firefox window like it always has, you will see a black command line window open first, a few instants before Firefox appears. And then, that black command line window will take up space on your Taskbar and on your programs list whenever you hit Alt+Tab. So I wouldn't recommend doing this for all your GUI programs. For more silent, batch tools like xsltproc or perl, though, this is much better than listing the path to every single one of them in PATH. I mean, at least it would have been much better in 1992.