Wednesday, June 01, 2005
Digitising BOB indexes
Electric Forest is fortunate to receive an email from freelance indexer Linda Sutherland, as several of our posts have dipped into the subject of back-of-book indexes. A point of interest: back in 1991, the Davenport Group (developer of DocBook) was trying to solve problems arising from the inconsistent use of terms in the master indexes of independently developed and rapidly changing technical documentation. The proposed solution was accepted by the ISO/JTC1/SC18/WG8 working group and published as the international standard ISO/IEC 13250:2000, or what we know as Topic Maps.
One or two comments in earlier articles seem to suggest that, if only librarians and publishers were willing, it would be easy to digitise the BOB (back-of-book) indexes of innumerable books, then merge them to form one single ‘mega-index’ to all of the books.
It's an attractive idea, and one which may become feasible in time. But making it so isn't simply a matter of persuading publishers. At least two practical problems will need to be overcome as well.
One of them is copyright. Roger writes of “releasing indexes to print books in electronic form, where no 'piracy' or digital copying could occur”. In fact it could occur, at least in some cases. Freelance indexers own the copyright in their work, except where the contract for an index expressly transfers rights to the publisher. If not transferred, re-use of the index without its creator's permission would certainly merit a black-patched eye.
The other problem is compatibility. A BOB index is a individualised, tailor-made product, crafted to suit one text and its target readership, and subject to any constraints on length, use of subheadings etc. that may have been specified by the client. Co-ordination with other indexes is rarely, if ever, a requirement.
Any attempt to merge indexes would have to cope with the consequences of that individualisation. The problems will include varying levels of specificity/exhaustivity/granularity, non-existent vocabulary control between indexes, and highly context-specific ‘see’ and ‘see also’ references which, if merged without editing, would almost certainly result in a jungle of misdirections.
Imagine merging together all the diaries ever written, then sorting the entries in chronological order. The result would be a history of sorts — but would you expect it to be the clearest, most readable, most reliable, or most succinct of its kind? Similarly, a ‘mega-index’ created by merging BOB indexes may not be entirely useless as a retrieval tool, but without a great deal of editing it will not be nearly as useful as you might expect. — Linda Sutherland



7 Comments:
I've been studying the master indexing problem for many years, along with many others, including Murray Altheim. I've come to the following conclusions:
(1) Perspective is everything.
(2) Perspective is more than everything.
(3) ...mumble...[unspeakable]... multiple perspectives...
(4) At any given moment, every human being is the center of the only universe that there is, at least for that human being.
(5) To give utterance is a universe-creating (or an least a universe-reifying) act. That's the real lesson of the Babel Myth. That's the reason why we talk to ourselves, too.
(6) We can pretend that there's a universal universe, but in fact it, too, is just an utterance. There's no hpossibility of a master index of everything. It's just another "Esperanto fantasy".
(7) But there is hope for something else: each of us can disclose how we address subjects, in such a way that, no matter how diversely we speak, everything we say can be, according to various disclosed opinions (which turn out to require exactly the same kind of disclosure), organized around the subjects that we say we're talking about. The subjects themselves can then become wormholes between our separate universes of discourse, which opens the possibility for master indexes! (But only as a result of the operation of combined disclosed opinions about what's what. The disclosed opinions can be entirely independent of each other, but nevertheless they can be made to work together in concert.)
We've drafted the requirements for such disclosures in ISO/IEC 13250 Topic Maps, Part 5 ( http://www.isotopicmaps.org/TMRM/TMRM-latest.html).
We're about to re-draft it, under the more generic name, "Subject Maps", with what is now called "Topic Maps" as an example of a disclosure of a universe of discourse.
I've demonstrated that such disclosures can be made, and that computers can act on those disclosures, in an implementation first made public in May, 2005 (http://www.versavant.org).
Steve Newcomb
P.S. I'd love to go on and on about world peace, relativism, the name of God, the human condition, etc. But usually it just pisses people off. I conclude that I live in a (relatively) weird universe. (;^)
Shortly after I got involved in the XTM community, I began to imagine a large topic map as a kind of "attractor basin", each subject reifier (topic) forming an individual basin in which people would have the opportunity to express their world views. Steve's wormholes satisfies the vision just as well, and he is showing that it can work.
Oh, oh, not -- Esperanto! That stings, Steve.
Well, I deserve it for the sloppy way I introduced the matter.
I don't believe there is a single, universal index any more than other rational beings. What I do believe is that combining indexes from different books in one electronic, unified index is no worse than the circumstance we have now, which requires you to:
- identify any books that might be on the subject you are interested in, using subject catalogs that don't deal with compound subjects well
- physically obtain each of these books
- manually look up the terms that might be applicable in each printed index
- check each reference
At least if the indexes are electronic and all in the same repository, you can check all the indexes at once, rather than serially. You also pick up any references in other books whose main subject didn't flag them as useful. You can also associate terms with each other, and, as suggested here, disclose something about them.
Of course, you still have to consult each reference manually in this scenario, but getting the indexes into electronic form is surely not as impossible as the books themselves.
Whether there are other advantages, or ways to prevent this from being a huge cacophany, I leave for a later discussion.
Linda's concern about violating indexers' copyrights ignores the way publishers work. If there is a demand / need / marketing opportunity that leads publishers to release the index for public use in this fashion, I have no doubt that the contracts with freelance indexers will be rewritten posthaste to reserve those rights to the publisher. That I think is true for all editorial work-for-hire circumstances.
In a better world, perhaps some idealistic foundation would fund the purchase of such indexes as are not donated for such use. Or maybe freelancers everywhere would adopt the attitude Jimmy Guterman expressed about his biography of Jerry Lee Lewis, "In the spirit of not being greedy by charging for something I've been paid for already, I'm 'reissuing' this book to the Web at no charge."
If someone wants to explain to me how creating a second use for a knowledge worker's product, a use which benefits only readers and has no direct commercial aspect, would exploit these indexers, I will rethink my proposal. If anything, I should think it would increase the demand for their skills, especially as amateur indexers' work would be more easily compared to theirs.
[Irony on] So, you see, Linda, I'm trying to help downtrodden indexers, not rob them. [Irony off] Well, but then this leads to my spiel about world peace, so I'll just end on that note . . .
Roger Sperberg
Roger, who, may I ask, told you that indexers are downtrodden? Here I am, venturing all alone into this strange and bewildering electric forest, standing up for my rights amongst you masters of electronic wizardry and technological juggernautery – and you imply that I’m downtrodden? I’m speechless. :-)
Well, not quite, because I do have a couple of things to say in reply to your comments. Yes, getting indexes into electronic form should be a simple enough process, technically. Merging the electronic versions should be even simpler, and I’ve no doubt whatsoever that speed of access to the indexes would be much greater. But it doesn’t automatically follow that either speed or quality of access to the information would be similarly enhanced. My expectation is that both would be constrained, by the problems I outlined and by another which I forgot to mention, namely the very long strings of page references you’d find attached to some headings.
There again, I haven’t asked you what level of usefulness you would regard as satisfactory for such an index. Unedited, it could certainly provide rapid identification of some relevant items, rather as Google does (or indeed, as a library’s classified catalogue does), and if that were all that was expected of it, an unedited merging might seem good enough. But any more demanding retrieval task would be a tedious business, and I for one would have no confidence in the quality of the result.
On the copyright issue, publishers and indexers may indeed negotiate different terms of contract – for indexes yet to be compiled. But if I’ve understand you correctly, you are suggesting digitising the indexes of books already published, for which terms of contract were agreed and signed up to at some time in the past. For those, a publisher would need to establish which indexes were created by freelance indexers, who the indexer was in each case, what the terms of contract were, and whether the indexer (if contactable) is willing to renegotiate on the matter of rights. Clearly, that’s a non-trivial task, and unless there’s some commensurate benefit in sight for the publisher, one they’re unlikely to undertake.
And finally, exploitation. I hadn’t given thought to the question of financial recompense, but my guess is that different indexers would take different views on that - some the altruistic view, others a more hard-nosed approach. What I’m sure they’d all agree on is that disregarding the rights they hold in relation to work they’ve produced would quite certainly be exploitative, and more.
Although I consider any freelancers in the publishing world downtrodden by definition, I would consider your fear that indexers would be exploited sufficient grounds to qualify -- it is the trodders upon who are exploiters, and so the exploited must be downtrodden.
But, tongue out of cheek, I see a Gordian knot of self-interests in the matter, in which no one will yield the slightest aspect lest somehow they be taken advantage of. Fie upon that!
If I own a copy of the book, how am I somehow exploiting the publisher or indexer to be able to consult my legitimate copy of the index electronically (and merged with others) rather than on paper?
I haven't suggested distributing indexes or any copyrighted material unrestrictedly but only the use of this portion of a book by the purchaser (or the library patron who's going to consult that physical copy of the book).
I'm interested in benefitting the reader, the researcher, the library patron. I would expect publishers and authors to gain, simply from the equation that additional use results also in additional sales. And, as noted, I would expect indexers in general to benefit indirectly, by increased demand for their services, rather than by direct recompense for past indexing.
Agreed, publishers and indexers won't just put these indexes out into the world without reason. And my point was that libraries -- and specifically digital libraries -- would have to push publishers to do this, and individual bookbuyers too. Without that kind of push, I agree with you that publishers won't undertake to do this. But are we digi-libs even talking about it yet? Fie and fooey and heck! What's up with us?
By the way, the big hullabaloo about how awful this unified index would be overlooks a lot of things mitigating the awfulness. If these indexes were in XTM or RDF, perhaps I could create a nonce index on the fly, for just the books that are about XSLT, say, or all the books that have "scholar's mate" in their index. This is one of the things that topic maps can do, and we should let these different capabilities shape our expectations.
I think the indexers ought to be pushing towards ways that would make their indexes mergeable, more useful, more flexible. Are there any indexing programs (the kinds that human indexers use in the preparation of their work) that even offer topic map or RDF formats as an export option?
These aren't new technologies, and as Murray points out, topic maps arose from the issue of merging indexes some twelve to fourteen years ago (Steve, straighten us out on the dates, please).
Here's a chance for indexers to lead the way. Shouldn't that be the issue we're talking with and not IP rights?
Roger Sperberg, on his hobbyhorse
In private communication I've been admonished for not making it clear that I, um, think Linda is right here.
I agree, things won't merge easily and there'll be all kinds of bad effects. I happen also to think that there will be so many good effects and ways to mitigate the bad that at the worst it's a wash. That part I've said.
But before Linda wrote, I haven't really considered the difficulty of managing such a scheme if individual indexers hold the copyright to indexes. It's certainly makes for a bigger problem than I originally thought about. OK, Linda, I see your point.
Roger Sperberg
Y'know, I do believe I can see a little bit of harmony appearing,away over there on the horizon!
Talk of benefitting readers and researchers certainly strikes a chord. I was a librarian before I turned to freelance indexing, and I still have the typical librarian's bias in favour of making information accessible,as widely and easily as possible. Many of my fellow indexers share that ethos, particularly those of us who belong to the professional indexers'
societies.It's why we're concerned to make indexes that work well, giving the reader quick, accurate and reliable directions to information.
That's why I keep going on about quality. I'm very easily persuaded that your idea would result in quicker access to the merged indexes. But I've seen nothing in what you say to convince me that unedited merging of indexes varying in depth and quality (remember some of the books may not have been indexed by professionals) will improve the speed, the accuracy or the reliability of access to information contained in the books. In fact,I'd anticipate a decrease in retrieval effectiveness, roughly proportionate to the number of indexes merged.
Returning to the copyright issue, there are signs of harmony even there. :-) You are right to describe it as a Gordian knot, and it's one confronting everyone involved in producing or distributing intellectual property these days. I'm no more sure than anyone else is about where the balance will eventually come to rest, between the interests of producers and distributors and those of the users. I am sure, though, that any solution which doesn't acknowledge the importance of the retrieval tools, and both acknowledge and reward the work of those creating them, is unlikely to be the most effective. (Your suggestion that a merging of indexes would lead to increased demand for indexers' services baffles me, by the way. I honestly cannot see how you reach that conclusion.)
The potential for indexers to play a role in pushing for indexes to be made mergeable is, I fear, somewhat limited. That's not — repeat not! — a way of saying they wouldn't want to. But the fact of real life is that a freelance indexer works to a brief provided by the client — i.e. a publisher. If clients want mergeable indexes, indexers will find ways of meeting the requirement. If not, they will spend their time meeting whatever other demands the client makes of them.
For similar reasons, export in Topic Map or RDF format isn't, to the best of my knowledge, a feature of any of the software designed as indexing aids. I can't speak for all indexers, but certainly I know of none who have been asked to supply an index in either format.
Postscript - I've just spotted your later message as I upload this. In return, would like to make clear, if I haven't already, that I'm in no way against your idea in principle. My worries are about the quality of the result and - don't think I've made this explicit before - the poor reflection on the indexing community if it didn't work well.
Post a Comment
<< Home