Friday, May 27, 2005
Tagsonomies and digital libraries
I know something about e-books and I see the most serious problem with e-books as publishers' completely ignoring the computer chip in the reading system [1]. I know little about the development of digital libraries. But I suspect they have missed the point of the web, whose model radically shifts the boundaries between author (or publisher) and reader, and I expect they are not properly formulated for a web world.
The classical functions of the library (or librarian)[2], as enunciated in The Intellectual Foundation of Information Organization by Elaine Svenonius and rewritten here for an electronic repository, are to enable the user:
- To locate a specific document when certain attributes (such as the title, author or date of publication) are known in advance (the so-called finding objective)
- To locate a set of documents representing
- All the documents from the same author or organization or governmental entity or court
- All the documents as part of the same series or aggregation or issuance
- All the documents "published" or released within a specific time-frame, or before or after a specific date
- All the documents on a specific subject
- All the documents referred to or cited within a single or master or central document
- All the documents defined by some "other" criteria
- All the documents defined by some combination of the above criteria (together these constitute the collocating objective)
- To choose among different types of documents, which are more or less suitable to the user's needs (the choice objective)
- To acquire access to the document or set of documents, through electronic delivery on-screen, download, printing, faxing, or other mechanism, in real time even if not pre-arranged (the acquisition objective)
- To navigate the collection — that is, to find documents related to a given document or to a given subject by generalization, association and aggregation, or to travel along axes of equivalence, association or hierarchy (the navigation objective)[3]
Earlier, I said libraries could combine the index from each of the books, electronic or otherwise, in their collection as a replacement for the subject catalog. This unified index would enable users to locate relevant material even when it was not one of the two or three most significant categories the entire book would fall into. And even if permissions allowed only one reader at a time of any segment, electronification could then allow multiple simultaneous users of a single licensed copy, each reading the segment of interest to them: In other words, you read chapter 4 on Leonardo while I read about Giotto in chapter 1.
But a more significant expansion of the notion of a subject catalog rests in the collective knowledge of web users. If, as Clay Shirky persuasively argues, formal taxonomies are (often) inferior to collaborative tagsonomies, [4] why shouldn't digital libraries involve the library users as cataloguers? Yes, individual inexperienced users don't measure up to a professional — but who thinks Zagat's restaurant ratings are inferior to professional reviewers'? The collective knowledge of average restaurant-goers prevents really wrongheaded assessments and ultimately causes the overall evaluation to center around legitimate issues. (And as a counterweight to that "lack of expertise," the body of all readers knows of more obscure but wonderful places to eat than any individual professional reviewer.) If every reader of The Renaissance, volume five of Will Durant's Story of Civilization, supplied terms that apply to chapter 17, however many variants there may be on "Julius II" and "Michelangelo" and "Raphael" and "Rome" and on other aspects of popes and art, we know that "David" and "Sistine Chapel" will not be left out. And isn't it likely that the "Sleeping Cupid" will be mentioned too? So what is that chapter about? The Sistine Chapel frescoes or Pope Julius? A single category is inherently wrong, and yet we rely on single categories for entire books in the current scheme of things. If we look at how people categorize things, we learn what it is they want to find. That's why you want every reader to contribute.
And when i say "every reader," I mean every reader, no matter what collection — the NY Public Library's or the Montclair library's or my own — the book belongs to. If a hundred readers will give us better results than one, why shouldn't digital libraries pool their data with all other libraries around the world, so that there could be ten thousand readers' tags for this chapter?
What I'd like to hear is that I am ignorant of just such a proposal. Do any digital libraries propose such partnership? I know the Simile project at MIT, which is related to DSpace, provides for semantic capabilities. Does it go farther?
I say that digitial libraries, in order to be relevant to this age, have to be more than just digital manifestations of the content and catalogs from earlier times. If everyone is an author on the web, so is everyone a cataloguer. When the library patron is an equal partner to the librarian in meeting those five objectives, then we will have a digital library.
[1] As noted in the exchange with Rick Brannan, this applies most significantly to e-books read on booksize devices and not desktops or laptops. But then, I wouldn't wish reading a booklength work on a CRT on anyone.
[2] Despite its imposing title, Svenonius' book is intended to describe libraries in non-technical terms for the non-library professional — in other words, me. What a librarian knows about information organization is something people working with the semantic web also need to know, which is why I highly recommend this book.
[3] In these statements, document stands for any separate object delivered to the user, and should be understood to include audio, visual and multimedia entities. A piece of a larger whole, such as one article in a journal, if delivered as a standalone object is considered a document in this context.
Ideally of course, the system will deliver all of and only the documents desired by the user — that is, the documents supplied should comprise as close to 100 percent of all the relevant documents in the collection (the recall rate), and of the list of documents returned to the user, as close to 100 percent as possible should in fact be relevant (the precision rate).
[4] I use the term "tagsonomy" in preference to the better established "folksonomy." Wikipedia applies the latter term to the "practice of collaborative categorization using freely chosen keywords. More colloquially, this refers to a group of people cooperating spontaneously to organize information into categories, noted because it is almost completely unlike traditional formal methods of faceted classification. This phenomenon typically only arises in non-hierarchical communities, such as public websites, as opposed to multi-level teams. Since the organizers of the information are usually its primary users, folksonomy produces results that reflect more accurately the population's conceptual model of the information." If there is a distinction between reader-applied and author-applied (eg, between tags at del.icio.us and technorati), I think that insignificant compared to the collaborative aspect.



