Electric Forest

Electric Forest

thoughts about books, digital libraries, and stuff related to expressing and keeping track of our thoughts...

Wednesday, June 27, 2007

Alien Resurrection

Well, I've crawled out from under the stone I was sharing with Osama (he has a longer beard but I still think I'm a better programmer), and last week began the staged rollout of the Ceryle project that's now been under development since, well, forever it seems. Multiple bunny lifetimes.

In the interests of getting some documentation in place, I've recently spent a fair bit of energy getting the Ceryle Wiki up and running, and the project is now looking fairly complete as a 1.0 Alpha. Rather than describe the details here I'd suggest checking out the wiki. The project has grown to be too big for one person, so I'm looking for collaborators for various components I simply haven't had the time to work on; check out the Ideas page linked on the sidebar.

This message does suggest the Electric Forest blog is again alive after a period of in a coma, and I'll use EF for posts related to "thoughts about books, digital libraries, and stuff related to expressing and keeping track of our thoughts..." and post things about Ceryle on the Ceryle Blog, which itself needs a resurrection, or at least a little resuscitation...

Labels:

Tuesday, August 01, 2006

Test post

This blog seems to not want to accept new posts. This is a test. It will be deleted.

Friday, May 19, 2006

If libraries are so important, why are they so underfunded?

Everything the libraries do, they do on shoe-string budgets. I've whinged about resource management and allocation, tools, people and priorities at libraries in the past, but all of that is within the realm of the hand-out from the government each year.

My brother-in-law works in a small branch of a high-priority departement, and their budget - for just their little 20 man division - is bigger than the whole yearly budget for the National Library of Australia, including wages and collection building.

Similar stories are found throughout the world; libraries gets little funding. So it follows that libraries are under-prioritised as cultural institutions, or have a lesser place in society? This is where the mysteries on the human character just amazes me, because the truth is that libraries are seen as pinnacles of modern society, as the proprietaries of human knowledge and history, on par with universities and other cultural institutions. Yet, because there is no income to speak of, libraries are seen as a nescessary annoyance in most budgets.

There are so many things I'd like to do, so many wonderful things, in the name of knowledge work, collection access, explorability, services, exposing all that information goodness we've got down in our various dungeons, but where is the public support for making that happen? Do people really belive that all libraries do is let folks borrow them for a little while? Do people not understand the amazing position the culture of libraries have served up through human development? Don't the understand the amazing stuff we could be doing for the future of all world citizen that corporations wouldn't even care about?

I fear not.

Sunday, May 14, 2006

Scan This Book!

An excerpt from Scan This Book!, by Kevin Kelly, The New York Times:

"The dream is an old one: to have in one place all knowledge, past and present. All books, all documents, all conceptual works, in all languages. It is a familiar hope, in part because long ago we briefly built such a library. The great library at Alexandria, constructed around 300 B.C., was designed to hold all the scrolls circulating in the known world. [...] Since then, the constant expansion of information has overwhelmed our capacity to contain it. For 2,000 years, the universal library, together with other perennial longings like invisibility cloaks, antigravity shoes and paperless offices, has been a mythical dream that kept receding further into the infinite future.

Until now. When Google announced in December 2004 that it would digitally scan the books of five major research libraries to make their contents searchable, the promise of a universal library was resurrected. [...]

I note that while the article mentions Google over fifty times, it goes to some pains to not mention any of the successful competition to the Google book scanning project, such that wanting to quote Brewster Kahle it becomes necessary to describe him as an archivist overseeing another scanning project rather than actually mention The Internet Archive that he founded, nor the Open Content Alliance (of which Google's competitor Yahoo is a primary contributor), nor Project Gutenberg (which has been around since 1971), nor, despite the obvious congruence of subject, Carnegie Mellon's Universal Library.

Google's motto may be "don't be evil" but I think it's important to remember that their scanning projects are hardly philanthropical; it's a necessary part of their business model to index all the world's knowledge so — like any good corporate citizen — they can better serve their shareholders by increasing ad revenues. Unlike public libraries, it's inimical to Google's business interest for projects like the Open Content Archive to succeed if they impact upon profits. Google is not open, nor should we expect them to be. They just happen to have a boatload of cash, unlike most libraries. Trying to figure out how to work with Google without selling their soul (or their holdings) to the non-evil empire is the razor's edge that libraries are now being asked to walk. Given that there are some good, viable alternatives, they can choose to walk on a different razor.

Tuesday, May 02, 2006

TinyWages

I had an idea. Maybe it's not a new idea, but it's an idea, and I don't get them all that often, or at least ones I think are worthy of sharing with the entire nation of the United States of America, or even those who might read a given newspaper's editorial pages. There are lots of people who have a bigger audience than I do (I have no doubt of that, nor would even my Mom, though I know nobody speaks anymore to the entire Nation, certainly not even the current President), but I thought I'd throw this bone by y'all. If it was your idea before I told it to you (like perhaps you had this same idea in the shower this morning but hadn't had a chance to write about it, or you had the idea years ago — when you were still young — and simply hadn't written anything about it publicly), I don't mind if you claim it for your own. We both know what's happened to concepts of "intellectual property" over the past decade or so, and I've all but given up that fight. I'm not even sure I want to be an intellectual anymore — look what it's done for W.

The idea is a coupling of the "tiny URL" idea and those garment labels that tell you what country the shirt was sweatshopped in. This would also possibly apply to other products, but that might alone kill the idea, so I'm just sticking to clothing (but knowing how much the Chinese worker who made that toaster is making would be of the same order of information as that of that Northern Mariana Islander who sewed the garment).

The tinyURL idea is that one can use a short code as a proxy for a longer URL, in this case a record in a US government database. E.g.,
     http://tinyurl.com/olzcv
This would be a proposal for a Congressional bill to require that right underneath the "Made in Guatemala" or "Made in China" on the label would additionally be a small URL (probably about 15-20 characters), a code that you could type into your browser to display a Web page hosted by an appropriate branch of the US government that would look up the code in a database and return back the average wage (converted into current US dollars) of the workers who made that specific garment. This average would include that of the cutters, seamstresses, etc. but not any management, as that would inflate the average. This data would be updated every business quarter, perhaps, or whatever is a typical business reporting cycle.

The cost of this programme would be borne by the corporations who pay these low wages (they can assuredly afford this additional reporting cost), and would amount to a reporting of worker wages of all products sold in the US to an appropriate government branch, the branch that would then take that information and use it to provide the online access to the database. They can be exempted from the requirement (and get to display a "living wage" marker) if they pay their workers above some reasonable measure of a living wage. This would be a tough one to determine — what is a real living wage in Indonesia?

The computer side of this project is simple — the large part of the project would be the organization, accumulation, and management of the data.

The one big benefit would be to make transparent the actual wages of workers worldwide, even if few people went to the trouble of looking up their products' worker wages. But there might be fall-on effects, such as causing people to begin buying from countries who paid their workers more reasonable wages (which might cause wage competition, though I'm perhaps being overly optimistic here). It would certainly let Americans know that there are hidden costs being borne by somebody, that those cheap Chinese products are cheap for at least one well known reason.

When I've had an opportunity to buy-local or buy-China, the former is almost always more expensive than the latter, but I know that at least where I live the labour laws demand that people get paid fairly. I'm probably an oddball but I'm often willing to pay more for fair pay. What I'd like to do is narrow the gap between the two so that the decision is harder for conscientious people to make. I'm not sure how to reach the non-conscientious, though letting them know that somebody got paid 35 cents to make their shirt might have an impact on some.

Last year I had the opportunity to buy a bed made in the north of England for 800 pounds, or a Malasian import for $600. The latter was of nice teakwood but had poorly-mortised joints and visible glue fill everywhere (I mean, how much can one expect when the carpenters are paid $2 a day?); the former was obviously made by proud British craftsmen (when they were sober, anyway). I ended up buying a relatively expensive (and really beautiful) bed made of native but sustainable woods, and the bed was made within a few hundred miles of where I live. I sleep well at night now, though I'm still paying off the cost of the bed.

Wednesday, December 07, 2005

Is there a cure for my sinful Topic Maps thinking?

A while back I wrote an article about fixing really bad XSLT code, mostly digging into the functional and declarative nature of XSLT. One part if it talked about schema design, how this practice to me is a wonderful exercise in trying to get to the core of my data. I found that this sits in middle of the Data First vs. Structure First war.

So not that long ago, I wrote a personal email to one of the proprietories of the XOBIS specification, a library-grown XML format for better handling of bibliographical metadata. The mail was an asked for critique of the format seen through Topic Maps goggles, basically dipping into the whole "better semantics through fewer atomic bits" where you express models not through semantics of names of elements and attributes, but the actual contents of it, pretty much the core of Topic Maps. I explained how his entire schema can be represented in a simplified and elegant Topic Maps XML format, and Behold! there were converts.

With every new XML schema that comes along (and, oh boy, a lot of old ones as well!), it seems most people who create them fall into the pit of linking the semantics of their language into the data structures to represent their content. I've always felt this a bit back to front, and heck, one of the main reasons I turned to the Topic Maps side of things, even if I can recognize the temptation to do otherwise. So.

Back to the beginning. Whenever I come up with some schema of sorts to fix some internal or external issue, all problems looks like nails to me which I can bang my Topic Maps hammer onto. But we've been tought through time and credo that this is a very bad thing indeed, and that I should be ashamed of thinking this way. I should repent, but I can't help seeing any XML schema in a Topic Maps light and feel wonderment as to why they simply didn't just create a typified topic as a role-player to a given association-structure to solve ugly, clunky and complex constructions.

Unisex datamodels are daringly sexy, oh so sexy, but are there times when I should fight my lustful way of life and simply settle down with something uglier and be happy with that? I am awash with filthy thoughts.

Friday, November 11, 2005

Library of the Future. 7



The books in the Library of the Future will be more like Paul Ford’s Ftrain than Pete Dexter’s Train.




Photo of F train by John Villanueva, www.transitspot.com. Used by permission.

Thursday, October 20, 2005

Topic Modeling

In my work with the IRIS project, I have had the opportunity to meet and work with people doing topic modeling. Sounds like an exciting subject. It is. Take Latent Semantic Analysis, tweak it to something called Latent Dirichlet allocation, add in gibbs sampling, chinese restaurant algorithms, and you've got a probabilistic means by which topics can be harvested from large bodies of information resources. Some interesting papers can be found at:
http://www.pnas.org/cgi/reprint/0307752101v1.pdf
http://eprints.pascal-network.org/archive/00000990/01/WS905BuntineW.pdf
http://cog.brown.edu/~gruffydd/papers/ncrp.pdf
http://cog.brown.edu/~gruffydd/papers/author_topics_kdd.pdf

Wednesday, October 12, 2005

In the Pay Library of the (near) Future, 5

In the Pay Library of the (near) Future, you'll be able to purchase or rent a book from any publisher as easily as you can get money from any bank's ATM today (even if you're overseas — systems in different countries will interact with each other). Some publishers, and some readers, will prefer this distribution channel to downloading. They'll carry their library in their wallet with their credit cards.