floating atoll: File folders: The carbon database filesystem

Walls of file cabinets can be found at any institution, containing thousands upon thousands of manually indexed documents. When someone's interested in using a folder within, they take it out, bring it to their desk, and work with it. Once done, it's put in a "to be filed" stack (or filed immediately). This is the most common way to structure a filesystem on today's computer as well: cabinets of folders of files, all carefully packed away on the wall of the office, waiting to be carefully taken out, worked with, and then carefully put back.

The translation of a carbon-based filing system to a silicon-based filesystem leaves out one key component: most people don't have someone to file their "to be filed" stack. The clearest sign of this lack is a directory filled with thousands of files, distinguishable by filename; the silicon brings is the ability to manage a stack of thousands of papers with very little effort, and all that's asked of the user is to choose a unique title.

Over time, some of those who initially have a folder with thousands of files will begin to create folders, for things that need to be lifted up out of the mess (urgent bills, closed cases, etc.). Given several years, many bookmark menus filled with links will end up carefully organized and sorted; people who start off putting every file in the "to be filed" folder make up simple ontologies ("Work", "Bills", "Family") and begin to refile their documents.

The analogy of a filing system to a database filesystem is a tough one: filing systems are generally subject to physical limits (you can't have two million pages in a single folder), and there's all sorts of features that don't exist outside of silicon. It's still an effective explanation for conveying what precisely this new "database filesystem" feature of the next OS upgrade is, though -- and many keep their documents (paper and digital) in a filing system and put documents in the "to be filed" folder when they're done.

Silicon brings a second advantage to the table: now people can work with tremendously large collections of objects with very little effort. Searching fifteen thousand songs on my laptop takes approximately one second; searching four billion web pages on Google takes approximately one second. This is where database filesystems can shine, and where the most confusion will lie. It only takes a few seconds to change the filing system; instead of hiring extra interns and spending a week reorganizing filing cabinets, the silicon shifts things around immediately.

In many commonly used filesystems, each folder is given a database of files and each file is given an assigned "name". Files may have other properties, but with rare exception these are not used to uniquely identify files; a file's "extension" is considered part of the "name". NTFS stands apart by bringing a second unique identifier to the filesystem (a two-column primary key, in database terms), but it's not commonly used or recognized by most.

As the filesystem becomes a collection of documents with a convenient selection of perspectives, the filing system metaphor becomes somewhat strained. It's not considered efficient to reorganize a collection of files every five minutes when it takes tremendous amounts of manpower and logistics, yet it goes unnoticed on computers everywhere, hundreds of times a day. A stronger analogy is necessary, to provide an easy path for harnessing the new possibilities.

Astronomers work with a collection of millions of objects every day, using different perspectives such as "color", "brightness", "position", or even "name". By aiming their telescope to a given perspective, they can precisely locate a star; if their calculations (or assumptions) are incorrect, then further work is required. Eventually they get it within the viewfinder, work with it for a while, and then move on to the next perspective.

Bridging the analogies, astronomers work with a single file folder filled with all the objects they have (the "universe"); then by sorting through different perspectives (such as "name") they find what they seek and work with it. Imagine a planetarium with all your documents broadcast on the ceiling in small print, and you need only a pair of binoculars and a direction to look to find anything in your collection. Unlike a filing system, astronomers have no need to re-file things when they're done, since the only thing that changed was their perspective.

A database filesystem, built properly, can allow the user to accrue every document in a single place, with the power to search through the collection efficiently. A document's "name" need not be unique, as long as the files with a given "name" are linked in some manner (say, revisions of a contract). With the ability to search through all the documents at once, filenames to some extent become moot; it's more effective for many to search for "Jan's resume" than to scroll through thousands of files sorted into directories (as evidenced by the recent popularity of Google, vs. Yahoo!). This is where the true power of a database filesystem lies.

Comments

By any chance did you ever try BeOS? The filesystem was effectively like a relational database. In addition, you could add custom filesystem attributes, and then search on those attributes. It's still leaps ahead of the various *nix filesystems in some ways, despite BeOS 5 being over 5 years old now.

It's a shame BeOS died, though you can still get updated versions of it online (though the kernel isn't updated) at beosmax.org.

Posted by: Daniel Berger | May 12, 2004 at 01:23 PM

I've always wanted to, but I never had a chance; it always sounded quite nice. As I understand, though, a filesystem developer from Be went to work at Apple, who is planning to release a database filesystem soon..

Posted by: floating atoll | May 13, 2004 at 07:56 PM

you know i'd have to mention the feedback filesystem right? *grin*

google has also been working on their desktop search client, which aims to solve some of these same problems. there might be even more interesting advances yet to come this year from any number of upstarts. and of course the longhorn filesystem, but its going to suck. :)

[i am biased towards implicit feedback methods where ever possible...]

Posted by: Martin Peck | May 19, 2004 at 01:03 AM

Hm, interesting... :-/ I have never knew about what is carbon filesystems.... Is it really good?

Posted by: Dmitry Yeskin | June 29, 2004 at 03:27 AM

the http://del.icio.us tagging system is quite effective - especially with the new tag intersection support... the metaphor could easily be applied to other domains.

Posted by: anselm | July 17, 2004 at 08:14 PM

What you are talking about has been around in information and library science for years. It's a great pity that many of those people involved in the organization of information who come out of a technology, computer or science background don't actually know about these studies, theories and practices.

Most of my work with people in the area of content managment finds me advising people who are constantly reinventing the wheel in this respect. And they come up with systems that are often ineffecitve or wrong.

One of the truisms in this field is that what seems obvious or intuitive to educated and intellegent people re how to classify, organize etc. is usually not the best way to do this. Libarians know this. It is a great puzzle to me as to why persons have not looked at classification theory.

I love your analogy re astronomers and the stars. Look into faceted classification - this is really what you are talking about.

Posted by: Cathy | January 19, 2005 at 03:05 PM

floating atoll

the sound of one mind thinking

File folders: The carbon database filesystem

Comments

About

Categories

Social

My Online Status

Recent Posts

Profiles

Locals

Legal

Metadata

Google