splitting semantic hairs

Adam Bosworth‘s excellent ISOC’04 talk has inspired some apparently-indignant comment. But his recent post rubs me the wrong way.

There’s a pretty common conversational pattern that emerges in the RDF vs. XML discussions stemming from the fact that the RDF people are never really talking about XML. The discussion is really decomposable along two axes: RDF/XML vs. “vanilla” XML and RDF vs. no-DescriptionFramework: between the particular data-modeling framework of RDF vs. an ad-hoc or informal one.

It’s certainly easy to invent namespaces and to coin terms — for both things and thing-relations — within those namespaces. It’s also easy to declare how some piece of XML should look … what should be tags and what should be attributes and what they should contain in order to represent a particular structure of information. But, yes, that’s the easy part. The hard part is coming to agreement on those items.

This complexity comes in two forms: syntactic and semantic. RDF looks to focus on expressing semantics at the expense of some human readability of the syntax. XML is all about a particular verbose syntax, and says nothing (or, maybe, the wrong things) about the semantics.

Semweb proponents believe that the simple S-P-O and URI framework encourages interoperability [my properties can be in-relation-to your subjects if we both agree that subjects are URIs] and representational complexity [i.e., graphs, not trees], and will thus allow conversations to focus on the actually-hard part: the modeling and particulars of the terms involved.

A lot of the syntactic issues shouldn’t be — it doesn’t matter if you put the title in an attribute or in an element. It does matter if you leave out a referrant. If you want to represent something that’s not naturally acyclic … well … you should be able to. If I want to further describe some resource on the web, I should be able to using it’s URI:

@prefix rev: http://www.purl.org/stuff/rev#. @prefix foaf: http://xmlns.com/foaf/0.1/. @prefix dc: http://purl.org/dc/elements/1.1/. http://www.adambosworth.net/archives/000034.html dc:title “Well!” ; dc:creator [ a foaf:Person; foaf:name “Adam Bosworth” ] ; rev:hasReview [ a rev:Review; dc:creator [ a foaf:Person; foaf:nick “jsled” ]; rev:rating 9 ].

Perhaps more importantly, however, if you want to leverage many person-years that have gone into well-crafted specs like FOAF and Dublin Core so that you either don’t have to re-invent the wheel; you don’t want to re-invent the wheel, because it forces you to re-negotiate those agreements again.

Let’s postulate a Resource-and-Site-Summary format … RaSS … that has an open no-DF extension policy: in order to define extensions, you simply craft a new namespace and a set of terms in that namespace plus some rules about what’s an element vs. an attribute, publish it, get it noticed and get tool support build to understand the particular semantics it defines.

Note that while something like Relax-NG or XMLSchema can help with the syntactic defintion, the semantic description is mostly in human-readable text.

Once you do that enough, you’ll want some reasonably standard way to define such extensions.

Moreover, you’re going to need to enforce some consistency in how things are modeled in order to to promote interoperability. Maybe a regular pattern for statements. Referring to things at multiple places in the same document. Lexical interpretation of datatypes. Domain/range constraints. Relation cardinality constraints.

At the end, I bet what you end up with is a lot like RDF, RDF schema and OWL.

Or perhaps it’s not.

But I’m in agreement with Danny, here: structured information is better than unstructured, and RDF is a simple and compelling way to structure data.

Alternatives?