Electric Forest

Electric Forest

thoughts about books, digital libraries, and stuff related to expressing and keeping track of our thoughts...

Sunday, July 31, 2005

Hierarchy vs. Facets vs Tags

A paper with that title is found here.
I am not yet sure I know what to make of it; there is a lot of material there, and links to related resources. The issues raised are real and important.

A truly first-rate hierarchy would not only have all of the characteristics of FN's hierarchy, but it would also manage to encode the hierarchy in such a way as to eliminate all ambiguity as to where an item might be found. FN comes pretty close. But you can always imagine that it might be hard to decide where that sock garter really goes? Bottoms? Legs? Ankles? Feet? It's also easy to imagine how that favorite pair of stretchy pants might do equally well @Home or @Gym.

[As a result, Hierarchies are horrible at #3: Targeted search and retrieval of individual items. In a hierarchy where items can only live in 1 place, the messier the hierarchy is, the harder it is to figure out where to put an item and the harder it is to figure out where you put it, when it's time to find it.]

But as you'll, see this is a problem even in faceted classification systems.

It turns out that the author of that paper has her papers indexed here, with plenty to read.

Friday, July 29, 2005

Data First vs. Structure First

The title is from this blog entry. First, consider this blog entry, where Jon Udell walks us through a day in the life of a tagger at del.icio.us, followed by a short sprint on how language grows. There's a link between the two blogs. Consider this quote from the first blog:
Some believe that the semantic web is an example of 'structure first' but it's really not the case.... yet, many and many people truly believe that in order to be successful a 'Structure First' design (well "ontology first" in this case) is the way you build interoperability.

As you might have guessed, I disagree.
Stefano Mazzocchi argues, first, that:
Data First strategies have higher usability efficiency (all rest being equal) than Structure First strategies.
followed by:
On a local time-scale and once established, "Structure First" systems are more efficient.
That reminds me of the old scruffies vs. neats arguments that ran rampant in AI circles for so many years.

My take: the jury's still out on this one.

Wednesday, July 27, 2005

Metadata copyright proposal

Following the links in Jack's post, and the links in them, took me to an article called "Liberalization of PNAS copyright policy" for the Proceedings of the National Academy of Science.

In short, it says:
Our guiding principle is that, while PNAS retains copyright, anyone can make noncommercial use of work in PNAS without asking our permission, provided that the original source is cited. For commercial use (e.g., in books for sale or in corporate marketing materials), we approve requests on an individual basis and may ask for compensation.
I think this should be how metadata is handled as well, answering the fears of indexers that they alone of the contributors to a book's creation would lose their rights.

OPENING THE GATES TO INFORMATION COMMONS

The title is cloned from this blog.
While respecting the right of corporations to charge for information, some information professionals are calling for fewer restrictions on its distribution and are lobbying for, or actively participating in, the creation of "information commons" -- a new way of producing and sharing information, creative works and democratic discussions. Like information portals, these "commons" (drawn from the historical existence of the English commons -- pieces of land to which members of a community had specific rights of access) are digital repositories of thematically related information.

There seems to be a trend here. Recall PNAS (Proceedings of the National Academy of Science), Plos (Public Library of Science), and arXiv.org (archives of physics preprints). Bernard Vatant mentions sustainable IT, and Roger says metadata should be free. There are numerous other examples, which I can't put my finger on right now; I'll likely add them to the comments.

Monday, July 25, 2005

Mapping knowledge domains

I'll be brief here. This post is mostly about citation analysis, one of many means by which domains can be mapped. A paper by that title in PNAS is an introduction to that issue of the proceedings. A link in that paper points to a kind of homepage with the same subject. From there, a link is found which points to a variety of papers on mapping various knowledge domains.

Wednesday, July 20, 2005

How do you read in a digital library?

Over at Teleread, I've posted an article about cross-platform reading.

For a digital library, the question must be What format should we make an e-book available in in order that it may be accessed by everyone who wants to read it?

I don't regard an ASCII text file to be an e-book format, nor is HTML read in today's browsers an e-book, but there are no other universal formats that can be read on a Windows, Macintosh and Linux computer, as well as the handheld devices, Palm, PocketPC and WinCE. (I not so reluctantly omit smartphones, but that is strictly my indefensible bias.)

A reader for the desktop machines which formats ASCII text books from Project Gutenberg with italic and bold and makes it presentable is under preparation. That may be one way out of the labyrinth.

And, of course, independently of our own choices, most commercial books are available in a restricted number of formats, with ASCII and HTML never an option.

Commercial publishers can ignore certain platforms or e-book formats if, for instance, they don't like (that is, trust) the DRM. Should digital librarians similarly advise patrons that some platforms are unacceptable?

I'm thinking particularly of Linux, for which currently only a single e-book reader is available — Plucker. You won't see any commercial titles in Plucker format, since it lacks DRM and so it seems likely to fall towards the bottom of supported formats. Do we totally write off Linux users as patrons of our digital libraries?

(Note: two new readers, OpenBerg and GIVE, are currently under preparation and will have Linux versions. Non-DRM, of course, returning us to square one.)

Well, I haven't mentioned PDF, of course, which you can read on Linux machines. PDF is a pretty much unsatisfactory for e-books, that is unless you want to do all your reading on a desktop computer. If you choose to support Palm, PocketPC, Internet Tablet and WinCE devices, you probably need PDF versions for each screen width offered -- that is to say, you can't have one file for everyone even though it has a .pdf extension.

Not a pretty picture. As David Rothman is fond of saying, it's a Tower of eBabel. He and Jon Noring propose the OpenReader format. Currently there are no readers that accept the OpenReader format and no e-books in it either. It may be too hypothetical for now.

So, what format do we offer our books in?