Thursday, June 23, 2005
Metadata should be free
Here's the lede from Michael Rogers' story at MSNBC:
Several years ago journalist John Lenger told a remarkable story in the Columbia Journalism Review about teaching a journalism class at Harvard's extension school. He asked his young students to write a story about a Harvard land deal that occurred in 1732, but after a week of research, most came back with almost nothing substantial to report. The problem: They had done most of their research using the Internet, walking right past Harvard's library and archives, where the actual information could be found. When Lenger questioned their research methods, one student replied that she assumed that anything that was important in the world was already on the Internet.This connects to the effort in transforming "books into bytes" in Rogers' story, but I think it's about the poor job we've done in putting metadata about books, such as their indexes, online, making the discovery of these resources more likely.
Years ago, people said protection was needed against "theft" of software programs, and it took years for the countervailing attitude to appear in a strong, coherent way. Today, free and open software includes operating systems like Linux and Solaris, and applications for word processing, image manipulation, spreadsheets, presentations, software development and a thousand different uses. IBM, Sun, Nokia and other companies have taken IP rights and software that cost millions of dollars and put them into the public domain. We don't say such software is not creative and not worth "protecting," but its free distribution is more valuable to our society than locking it up. Enough people see this that the cultural effect is so far in advance of the laws it's kind of scary.
I'll skip over the arguments for scrapping patent laws that favor one side of this argument over the other. And there are many sites for discussing the similar issues around copyright. In the book world, people are beginning to see this, such as Cory Doctorow, who has posted his third book on the web in a variety of formats, saying "Share this book! That's what it's for." [1] The battleground goes beyond the legal arena to the cultural aspects, I would say. Just what are our obligations to creators and researchers? The IP pendulum needs to swing back to the middle in our minds as much as in our laws.
Metadata, creation and copyright have a tenuous connection to each other that ought to follow the open software model, to my way of thinking. Sure, there is intellectual effort that goes into the creation of an index, say, just as there is in the writing of a software application. But our society will benefit more — from wider, deeper, more accurate searching — when such information can be readily shared.
Metadata should be free. Metadata about non-electronic resources especially.
[1] Doctorow actually goes much further than simply allowing the electronic version of the book to be read for free, saying: "What's more, if you live in the developing world -- a country not on the World Bank's list of high-income countries -- you can do much more. You can make your own editions, charge money for them, make movies, translations, plays and anything else you care to, and charge whatever you want, without sending me one cent -- you don't even need my permission. See the FAQ for more. The only restriction is that you can't export your versions to the world's high-income countries where all my paying customers are. Deal? Deal."



10 Comments:
For a freelance indexer, the crux of the matter is that indexes put food on the table and keep a roof overhead.
I've never understood how developers of free software make a living. If you can explain that to me, then maybe we could look at whether that model could be adapted?
One of the main arguments for free flow of "information" is that it enlarges the market (and I put information in quotes because I mean it in the broadest possible, almost philosophical sense — content, metadata, software, services and beyond).
For freelance indexers, it would mean a greater demand for their services as "publishers" find they can't compete as well if their books can't swim as successfully in the open-information sea. And the term publishers here encompasses companies and individuals that disseminate information, even about themselves, that today we don't think of as publishers.
In the 1980s, there was this type of expansion in the demand for designers as desktop publishing tools greatly increased the marketplace for design services, maybe enlarging it as much as ten times its previous size.
I recognize that this is an opinion, not an analysis based on close study. I'm not an economist (though I played one in college). And I recognize this disclaimer provides little comfort. But can you tell me what effect this will really have on freelance indexers? Just hw big is the after-market for BOB indexes? Are there any studies or can you offer your own estimate?
— Roger Sperberg
Sorry, Roger, but I need to ask a couple of things before I can try to answer your points.
I had not come across "/* until I wandered into this forum, and am not too sure what it means. Nor am I entirely clear what you mean by an after-market for BOB indexes.
Linda, at first I was a bit confused by your comment. Then I realized you actually were confused by the three characters "/*. I'm guessing that you using the Opera browser. For some strange reason, Opera seems to render content in italics as those three characters (but only on Blogger-based blogs), effectively rendering incoherent a lot of blog entries. I'd love to know why this happens, but I have no idea. If you want to see what Roger actually said, you can try a different browser, like FireFox. Drives me nuts, as I have a copy of Opera which other than this bug is pretty good.
Thanks, Murray. I have Firefox somewhere, shall try with that.
Ah, now that looks better! Thanks again, Murray. Why on earth isn't all this stuff standardised by now?
I wonder if the problem with Opera is related to the character encoding? I noticed a glitch today, where an em dash was misrepresented as the three characters: —
The text is actually encoded as UTF-8, and blogger.com reports this correctly, but when accessed via altheim.com it is reported to be ISO-8859-1
I don't mean anything tricky by that phrase. I only am wanting to ask what opportunities for income arise from owning the copyright to an index once it has appeared in a printed book.
For instance, if the publisher of a hardcover edition pays the indexer, and then the book is reprinted in paperback, does the paperback publisher pay the indexer to use the index?
I'm certain a license could be written that would allow libraries and searchers to use the index's information but which would still require anyone producing a new edition of the book (electronic or print) to pay the indexer for the index. In other words, no loss of income.
I know that writers sell the "first North American serial rights" to magazines here in the U.S., which enables them then to sell the same article or variants to smaller magazines or to publications in other countries. And they reserve movie rights and so on, but I'm not sure how analogously writers' sales to new markets resembles that of indexers. Any light you could shed on the post-original-sale income (what I called the aftermarket earlier) would be welcome. My surmise is that it is small, but more critically that re-use of the index can be divided into publishing and non-publishing uses, and only the latter freed.
-- Roger
Thanks for the clarification, Roger. I didn't for one minute suspect you of anything tricky, I just wasn't familiar with the term 'after-market'.
There is, I regret to say, no one simple answer to your question. Ongoing publications, such as journals and loose-leaf manuals, are highly likely to generate further income for their indexers. That's because continuity and consistency of indexing is important in those cases, and publishers seem to recognise that employing the original indexer to do updates makes commercial sense. Also, among professional indexers at least, it's considered bad etiquette to update another's work, unless you've first established that the other indexer can't or doesn't want to continue with it.
An index to one book may generate further income for its creator if the book goes into a new edition, or is reprinted as a paperback. The income is less certain, though, than in the case of ongoing publications. Textual scholars apart, users rarely if ever need to retrieve information from more than one edition at a time, making continuity and consistency of indexing much less significant. There is less incentive, therefore, for the publisher to return to the original indexer, except perhaps in the case of an unchanged text reprinted in a different format. For that, updating the existing index could be perceived as the most cost-effective route, and if so the original indexer, for various reasons, is likely to be offered first option on
the work.
Incidentally, you cannot simply re-use the index to a hardback when preparing a paperback edition. The difference in page size makes the original index unusable without repagination to match the reformatted text. Currently, that is not the straightforward task many publishers and others imagine it to be, though technology looks likely to simplify it in future.
Your idea of making indexes available for search but not publication is an interesting one, albeit not, I have to say, entirely new. There was a discussion of something similar on an indexers' list a few years ago — prompted, I think, by Amazon making the indexes of some books viewable on its site — but as far as I know no one's taken the idea further.
For my part, I have three thoughts to offer. First, licensing for search should not be obligatory. You asked at some point what our obligations to creators are — well, for me, the right to control what is done with or to the created work is non-negotiable. I may or may not decide to allow others to make use of my indexes in specified ways, but the decision will, and should, be mine.
Second, I can imagine indexers — some of them, at least — permitting search of their indexes for non-profit uses. I cannot imagine any of them being happy with a general license which carried the possibility that others might profit from their work while offering nothing in return.
The third thought stems from the second. It is that license to search would be most likely to work to the mutual benefit of searchers and indexers of humanities books, because that is the area in which commercial considerations seem likely to have least influence. In other areas, for example the sciences, medicine, and law, I suspect indexers would be a great deal more cautious. The information they work with is more likely to have commercial value, and experienced indexers in any of those fields will have built up a large corpus of copyright work, representing a considerable investment of time and effort over the years. In aggregate, their indexes could amount to a commercially valuable database, one which they might reasonably think twice about making too freely available.
Opera supports CSS content, and one of the style sheets for this page (/css/comments/cmt_main.css) has i{content:"\"/*"}, so Opera correctly transforms the content of <i> elements to "/*.
If the stylesheet is under your control, change it; if not, report it as a bug.
Pete
Post a Comment
<< Home