Wednesday, June 27, 2007

Not Quite Dead 

Well, I've crawled out from under the stone I was sharing with Osama (he has a longer beard but I still think I'm a better programmer), and last week began the staged rollout of the Ceryle project that's now been under development since, well, forever it seems. Multiple bunny lifetimes. Almost the entirety of the Bush Administration.

In the interests of getting some documentation in place, I've recently spent a fair bit of energy getting the Ceryle Wiki up and running, and the project is now looking fairly complete as a 1.0 Alpha. Rather than describe the details here I'd suggest checking out the wiki. The project has grown to be too big for one person, so I'm looking for collaborators for various components I simply haven't had the time to work on; check out the Ideas page linked on the sidebar.

I can see that the blog template needs some work. sigh. It's funny how even digital things get covered in dust when left alone long enough. New accounts, new ISPs, new ownership for Blogger, defunct email accounts, forgotten passwords, and simply spending time on other things. Now that Ceryle's in release it's time to get things up and running again.

Thursday, February 10, 2005

Ceryle and "Grass Roots" Ontologies


From Steal this bookmark!, Katharine Mieszkowski, Salon.com:
"But what's intriguing about 43 Things isn't the voyeuristic itch it scratches, as we get to see so many people baring their heart's desire. What makes the site work is how it connects all these people to each other. By a simple software tweak known as tagging, this site and many others, like the photo site Flickr and the bookmark-sharing system del.icio.us, have found a new way to organize information and connect people. The surprise is that the organizing itself is unorganized -- and yet it works.

On 43 Things you state a goal, such as "write a novel." That immediately links you to all the other people who have the exact same goal. But you also attach tags to your goal -- essentially key words that you choose -- such as "writing," "novel" and "fiction." Tags are not selected from any pre-codified hierarchy set by the site designers. They simply arise from the grass roots -- you and others like you. Now you're suddenly connected to everyone with similar goals, such as "write a good novel" and "write a book and have it published" and "finish my novel."

On reading this article, it strikes me that sites like 43 Things are based on some very similar assumptions to my own work. The Knowledge Representation field is full of researchers who talk about computer-based "ontologies," where the concepts and relationships between concepts are almost always very formally represented, where machine-based reasoning upon these structures is done in a very formal way, according to logical rules. This is anything but "grass roots," as one needs a level of expertise that only a handful of people in the world have. Furthermore, the big epistemological problem behind this entire approach is that people don't think or act in formal, logical ways, and I'm yet to be convinced that the world itself operates according to those rules either. Why would one believe something so complex as the human mind operates on logic when even at the most base level of physics we're dealing with quantum mechanics? John Sowa talks about this issue in a tract called Knowledge Soup (404K PDF).

It's been the crux of my own work to develop something that the Rest of Us can use. It unfortunately was the cause of a fair amount of friction between myself and one of my advisors in the KR field, such that he's no longer my advisor. He felt that my work "didn't provide any contribution" to the field, which is nonsense, it's that his approach to the field doesn't provide any contribution to the real world we live in, unless one actually believes the hoo-hah that words carry intrinsic meaning, absent any context or human interpreter. Much of the KR work I've seen relies on this nonsensical presupposition. Even a dictionary tells otherwise. Of course, none of this has ever stopped a funded project from "success." The fundamental epistemological issues are all conveniently just swept under a rug.

Now, the limitations here are language. When someone makes a specific statement, we have to take it on faith that the words used by one person mean the same thing to others. This is obviously not true at all, but apparently works well enough that people can actually communicate with each other. There are errors, but life is iterative. The big deal is context, but that's a subject that itself deserves a great deal of thought (more than one blog entry).

So when I see something like this article on 43 Things, where people are just using normal language to create concepts and relationships between them, I take some heart. One has to remember that the words chosen don't have some kind of formal behind-the-scenes, mathematical model, but that doesn't make them un-reasonable. In fact, there's some [minor but] functional reasoning going on by 43 Things when it connects people up by statements they make about their goals and aspirations. It is machine reasoning, just not so complicated. But useful. And not pretending any formality. And in that, it shares a similar approach to Ceryle. Ceryle is about using a small set of existing relationship types (called Association Templates in Ceryle due to its Topic Map background), such that when someone uses one of those existing Associations they're tacitly agreeing to its definition. This is what would be called in more formal terms an "ontological commitment": i.e., by agreeing to use a term one is bound by its meaning, just like in real life.

So in Ceryle, there's three main ones (though once I have the Association Editor finished, you'll be able to make your own):
  1. "is a" (class-instance) e.g., "John [instance] is_a human [class]."

  2. "kind of" (superclass-subclass) e.g., "Human [subclass] kind_of Mammal [superclass]"

  3. "has" (has facet or property) e.g., "John [faceted] has brown eyes [facet]."

The latter terminology is derived from Faceted Classification (FC, aka Faceted Analytical Theory), an organizational schema coming from library science. The words "property", "attribute", "characteristic", etc. might be considered synonyms, but "facet" is important because by bringing in the FC schema it implies that the specific facet itself fits into a hierarchy, and this hierarchy comes from the superclass-subclass Associations it has within the current ontology. In this way we can understand that brown eyes are a particular kind of characteristic, and in fact the whole structure of facets and facet relationships includes cardinality (how many), data typing, and measurement (such as what units a facet might be measured in). Not to make this discussion any more complicated, but in FC each facet fits into a tree structure, whereas in Ceryle it fits into a graph. In fact, the Topics acting as facets aren't considered special in the ontology — they're just Topics playing the role of facet in a facet Association. More on that later...

There are plans and partial implementations of all of this latter stuff in Ceryle, as well as modules for mereology, topology, and other important domains, and I hope to flesh this kind of thing out more in the future. It'll be how you add a birthdate and deathdate to a person, with Ceryle knowing that they're a particular kind of date and being able to do certain kinds of reasoning based on them, such as being able to set all events within a graph into a timeline.

There's a lot of different ontological and epistemological issues that need to be addressed, another being subject identity: how does one ascertain when two subjects are actually identical. Identity is a slippery and tricky concept. By "identical" I don't mean equal but equivalent, and additionally, equivalent within a context. Even subject identity is contextual.

This whole "grass roots" approach to computer-based ontologies is what about 95% of my entire research has been about. If that isn't a contribution to the Knowledge Representation field, I guess the field isn't much of a contribution to the grass roots either.

I don't believe it.

Wednesday, January 26, 2005

Optimize, Shmoptimize. Pancakes are the thing.


Yeah, I've heard it before, that idea that you shouldn't optimize early, like before 7am. Well, it's 6:52am and I've spent the last six hours optimizing and I've had it up to here (you can't see but I'm pointing to my neck).

This all started when a couple of people using Ceryle thought it was locking up while visualizing a Topic Map, when I was pretty sure it wasn't locking up, it was just taking a damned long time. Part of the problem was a lack of feedback. A few days ago I tried get the progress bar working again, but found that when I managed to get it redraw properly, it was now a competing Thread (if it ever got enough time to redraw itself consistently) and bumping an 11 second visualization, Ceryle's Large Graph, up to over 47 seconds. This doesn't sound like a huuuge amount of time, but when you're waiting 47 seconds for something that should take four seconds and your computer just went all mysterious on you, well, you think about giving up, moving to the Bahamas, pouring your life into a bottle of dark rum and watching the hot oiled bikini action on the beach from a sling chair.

I still think about that, but I decided to optimize. So I got out my slide rule and pocket protector and went to work. I changed the visualizer from an invisible object to being a dialog box, wrote a bit of code to spit out performance statistics so I could see what was making a difference, and by golly, if I didn't get that 11 second visualization consistently down to 6 seconds, and sometimes below 4 seconds. It even looks nice and has a functional progress bar of its own. Being the month of January, visualizing Topics was always fast, but the Associations were like molasses. Now they're a bit like BC Golden Cane Sugar, which is excellent on pancakes. And oh, do I like pancakes. If you're ever in San Francisco, you gotta check out the It's Tops diner on 1801 Market (cross street is Valencia, if I remember right), as their blueberry buckwheat pancakes are the best pancakes in the entire world.

Thursday, January 20, 2005

I Deserve a Cocktail


Another Christopher Robin blustery day, the winds whipped furiously around as I made my final just-one-more-thing pass over everything. I'd planned on releasing the new version last Monday, but the events of the past week have transpired to delay posting until today.

But it's out! Version 1.0 alpha 10, which includes drag and drop, a rearrangement of the menus, assorted other new features, and just works better. There are still some things that don't, and a few things missing, but Ceryle is starting to look a lot more polished. I've still got that big 800 pound gorilla hanging over my head, the Association Editor, which is now sitting as Priority 1 for the next version.

I now have new Ceryle User and Developer mailing lists available. I'm not opening it up for direct subscription to avoid spammers, but upon a simple request, I'm happy to add anyone to the list. I'm still trying to figure out how to set up an online list archive (ala hypermail or equivalent), which would be nice to do before I actually have any message traffic.

But for now: a cocktail.

Monday, December 20, 2004

Hitting Send Never Felt So Good.


That first, virginal Send on the last release was accompanied by a fair amount of anxiety over how Ceryle would perform. Call it "tool anxiety." Well, that first round led to a number of repairs (on some things I didn't even know were problems) and I feel a bit more confident with this second release, almost relaxed. Maybe I should smoke a cigarette. I'm hoping it is good for you too.

In addition to release of Ceryle 1.0 alpha 9 tonight, I've recently posted some big screenshots of Ceryle, doing what it does. Hopefully this will help people in understanding at least the rudimentary functions (what a word is rudimentary!). The Dynamic Forms feature is now functional (actually, again; it was busted), and it's now possible to download content right into your Ceryle database from the Web. Remote Help updates! There's more under the covers, and some is still just a twinkling in my eye.

But for now it's nice to be able to take a breather for a day or so.