posting delay; homebrew recipes and the semantic web

Sorry for the long delay in posting. A few months ago, the girlfriend and I bought a sinkhole of time and materials generally referred to as a house. This one was particularly large in that it needed some serious internal rennovation before being habitable. Between that and a desire to focus this less on just talking and more on talking about production, I’ve not had much time or much to write about…

For the past couple of years, I’ve been pretty interested in the semantic web technologies, and in RDF. I came to the space looking for rigor about how to use XML to model data. What I found was really what I was looking for: both the recognition of XML’s true nature as well as a better way to actually do the data-modeling I was seeking… or so I thought.

Like many others, I feel into the trap of “RDF for everything!”, neglecting to remember that it is the (web) Resource Description Format. While many things are, or can be (and should be), Resources, the framwork is setup more for the wide-scale exchange of information about those resources, not as a general-purpose “small-scale” data-modeling framework. However, many things can be resources…

One project I started just about a year ago, when I started homebrewing, was keeping my homebrewing journal in a file. I knew that I wanted to keep notes in a good bit of structured detail, but I didn’t want the pain associated with authoring them in XML. I decided to author them in RDF, in N3, along a loose ontology that I would create and extend as I authored.

The main aspects to record in such a journal are the following:

  • the recipe itself
  • grains, adjuncts, hop-schedule, and notes
  • instances of brewing the recipe
  • deviations from the recipe
  • notes about the brewing process (e.g., tempatures reached, times of boiling, &c.)
  • readings of the specific gravity of the wort at the beginning and end of the fermentation.

As such, I ended up with notes that look like:

<#maplePorter> a hb:Recipe ; rdf:label “Maple Porter” ; dc:created “2005-04-02-05:00″^^xsd:date ; foaf:maker <#jsled> ; hb:ingredient [ a hb:Grain; hb:amt [ rdf:value 4.75; hb:units hb:lb ]; rdf:label “pale extract” ]

,[ a hb:Adjunct; hb:amt [ rdf:value 2.5; hb:units hb:lb ] ,[ rdf:value 43; hb:units hb:oz ]; rdf:label “Vermont maple syrup; unblended @ 20 min remaining” ] ,[ a hb:Hop; hb:amt [ rdf:value 2.0; hb:units hb:oz ]; rdf:label “Goldings”; hb:hopTime [ rdf:value 60; hb:units hb:min ] ]

. <#aprMaplePorter> a hb:BrewInstance ; hb:recipe <#maplePorter> ; dc:created “2005-04-18-05:00″^^xsd:date ; foaf:maker <#jsled> ; hb:journal [ a hb:OriginalGravityReading; dc:created “2005-04-18-05:00″^^xsd:date; rdf:value 1.042; hb:temp [ rdf:value 60; hb:units hb:F ]] ,[ dc:created “2005-04-18-05:00″^^xsd:date; rdf:label “Used Wyeast 1318 (London Ale Yeast III), cultured Feb 15 2005, incubated for 2 days” ]

.

For quite a while I’ve been hoping to publish the collection of data, autogenerated from the RDF into HTML. I finally set down and have done it. http://asynchronous.org/homebrew/ contains both.

The pipeline is the “standard” one: cwm plus a bit of inference into XML, then XSLT into HTML. http://asynchronous.org/homebrew/meta contains a bit of the detail. It’d be nicer to use a RDF-specific path/templating/transformation language, but xsltproc is widely available and simple.

A major part of the process was cleaning up and making-consistent the journal data. In doing this a couple of issues with the ad-hoc formating I’d naturally done in the RDF were made much more clear:

  1. The relation between a recipe and an ingredient should really be made quite regular through the use of a URI for the ingredient, rather than simple a value string. While true in recipes generally, this is particularly true in beer recipes, which draw from a small, fixed set of ingredients and vary primarily in the n-ary relations of those ingredients to the recipe.

  2. What are currently n3-file-local fragments should really be first-order resources at http://asynchronous.org/homebrew/recipes/<filename>. One thing that RDF makes particularly painful is the seamless transition from a very informal organization into a concrete one.

LML and RX

Last weekend I got to thinking about alternative representations for RDF in XML, inspired by Bill de hÓra’s LML format.

Over the course of this week, there was some discussion on www-rdf-interest that prompted me to think through some of the details.

This weekend, I wrote up what I was thinking a bit more specifically. The result is a format for RDF in XML called Rx. It’s basically an unstriped syntax for RDF in XML which maps RDF Properties to XML elements, with RDF/XML’s parseType="Resource" the default.

Coming full circle, I can now re-express LML in RX, in order to get an idea of how different it is from Bill’s pure-XML syntax.

The major difference compared to Bill’s format is the moving of data out of attributes an into XML elements. RX just doesn’t allow it, since it’s yet another option for serialization that it is possible to do away with.

In any case, here’s the same LML example from last week, but in RX:

001 Bill de hora jsled rdf http://www.w3.org/RDF/ markup http://www.w3.org/XML/ 2005-01-08T10:57-0500 2005-01-08T10:57-0500 http://asynchronous.org/ns/lml yes original post http://www.dehora.net/journal/2005/01/lmllistmarkuplanguage.html Danny’s response http://dannyayers.com/archives/2005/01/08/list-markup-language/ followup to Danny http://www.dehora.net/journal/2005/01/dataabovethelevelofasinglesite.html

LML, XML and RDF

Bill de HÓra published a nice-and-simple List Markup Language today. He’s framed it as an XML language, and notes that it would be nice to have an XSLT transform into RDF. In a subsequent post, he talks a bit – basically – about the RDF tax.

So, given that RDF is not RDF/XML, and since it may be time to deprecate RDF/XML anyways, I was thinking two things:

  1. What’s the trivial translation into the RDF model?
  2. How far is the presented XML from RDF?

In Turtle

To get at the first, I simply re-wrote the presented data in Turtle, though I wanted a bit more meat, so I added some content into the example ( more specifically, since there’s so little syntactic framing in Turtle, there’s not much left if you don’t put example data in… ):

-- n3 --

turtle, actually, but I only have an ‘n3′ mode for emacs…

@prefix i: http://www.dehora.net/lml/2005/01. @prefix xsd: http://www.w3.org/2001/XMLSchema# .

<#> a i:lml ; i:version “001” ; i:published [ i:when “2005-01-08T10:57-0500″^^xsd:datetime ] ; i:changed [ i:when “2005-01-08T10:57-0500″^^xsd:datetime ] ; i:author [ i:name “Bill de h\u00D3ra” ] ,[ i:name “jsled” ] ; i:category [ i:name “rdf”; i:subject “http://www.w3.org/RDF/” ] ,[ i:name “markup”; i:subject “http://www.w3.org/XML/” ] ; i:list [ i:ordered “yes” ; i:href “http://asynchronous.org/ns/lml” ; i:items ( [ a i:item ; i:content “original post” ; i:href “http://www.dehora.net/journal/2005/01/lmllistmarkuplanguage.html” ] [ a i:item ; i:content “Danny’s response” ; i:href “http://dannyayers.com/archives/2005/01/08/list-markup-language/” ] [ a i:item ; i:content “followup to Danny” ; i:href “http://www.dehora.net/journal/2005/01/dataabovethelevelofasinglesite.html” ] ) ] .

I noted a few things in doing this:

  1. The namespace should end in a ‘/’ or ‘#’ to make things prettier.
  2. It’d be nice if the version was an integer. I’m curious about it’s structure and semantics — does it need to be 3 digits? Does it represent some sort of compatability assurance?
  3. It’d be nice if the ordered attribute could trivially be a xsd:boolean “true”/”false” rather than “yes”/”no”.
  4. The published and changed dates probably don’t need to be to wrapped in bnodes [ owl:ObjectProperties], but could more directly be straight data-type properties [owl:DatatypeProperties].
  5. The item-list construct doesn’t map directly.

The first three are more spec quibbles, and the 4th is just a modeling question. But the last is bit more interesting of an issue. I’ve seen before and don’t yet fully understand the mis-match, but it’s definitely a sticking point about RDF…

… in any case, above I’ve inserted the itemsproperty, allowing the list bnode to hold it’s properties [ordered and href], and the “striping” to work out in the RDF model. It does make more sense to me this way — the list has 3 properties: ordered, href and it’s children/items…

I believe the primary mismatch is that in RDF there’s no notion of implicit children or containment. That is, in XML there is an un-named “hasA” or “isA” relation of a node being inside another node. In RDF, all relations are typed via a property.

In XML

The second question can be re-phrased as “what’s the minimum set of changes that need to be made in order for the presented example to be parsed as RDF/XML?” The answer is: very few, kinda.

Bill de hÓra jsled rdf http://www.w3.org/RDF/ markup http://www.w3.org/XML/

Here, I’ve inserted parseType="Resource" in a couple of places, as well as parseType="Collection" on the items element; this last one prevents rapper from parsing it correctly as RDF/XML, but I think it’s a bug in rapper, actually.

Now, what would happen if parseType="Resource" was the default? And there was a simpler way to specify collection-style parsing? I fear that the constraints imposed by RDF would continue to retard it’s adoption, but it might make for something a lot more palatable.

splitting semantic hairs

Adam Bosworth‘s excellent ISOC’04 talk has inspired some apparently-indignant comment. But his recent post rubs me the wrong way.

There’s a pretty common conversational pattern that emerges in the RDF vs. XML discussions stemming from the fact that the RDF people are never really talking about XML. The discussion is really decomposable along two axes: RDF/XML vs. “vanilla” XML and RDF vs. no-DescriptionFramework: between the particular data-modeling framework of RDF vs. an ad-hoc or informal one.

It’s certainly easy to invent namespaces and to coin terms — for both things and thing-relations — within those namespaces. It’s also easy to declare how some piece of XML should look … what should be tags and what should be attributes and what they should contain in order to represent a particular structure of information. But, yes, that’s the easy part. The hard part is coming to agreement on those items.

This complexity comes in two forms: syntactic and semantic. RDF looks to focus on expressing semantics at the expense of some human readability of the syntax. XML is all about a particular verbose syntax, and says nothing (or, maybe, the wrong things) about the semantics.

Semweb proponents believe that the simple S-P-O and URI framework encourages interoperability [my properties can be in-relation-to your subjects if we both agree that subjects are URIs] and representational complexity [i.e., graphs, not trees], and will thus allow conversations to focus on the actually-hard part: the modeling and particulars of the terms involved.

A lot of the syntactic issues shouldn’t be — it doesn’t matter if you put the title in an attribute or in an element. It does matter if you leave out a referrant. If you want to represent something that’s not naturally acyclic … well … you should be able to. If I want to further describe some resource on the web, I should be able to using it’s URI:

@prefix rev: http://www.purl.org/stuff/rev#. @prefix foaf: http://xmlns.com/foaf/0.1/. @prefix dc: http://purl.org/dc/elements/1.1/. http://www.adambosworth.net/archives/000034.html dc:title “Well!” ; dc:creator [ a foaf:Person; foaf:name “Adam Bosworth” ] ; rev:hasReview [ a rev:Review; dc:creator [ a foaf:Person; foaf:nick “jsled” ]; rev:rating 9 ].

Perhaps more importantly, however, if you want to leverage many person-years that have gone into well-crafted specs like FOAF and Dublin Core so that you either don’t have to re-invent the wheel; you don’t want to re-invent the wheel, because it forces you to re-negotiate those agreements again.

Let’s postulate a Resource-and-Site-Summary format … RaSS … that has an open no-DF extension policy: in order to define extensions, you simply craft a new namespace and a set of terms in that namespace plus some rules about what’s an element vs. an attribute, publish it, get it noticed and get tool support build to understand the particular semantics it defines.

Note that while something like Relax-NG or XMLSchema can help with the syntactic defintion, the semantic description is mostly in human-readable text.

Once you do that enough, you’ll want some reasonably standard way to define such extensions.

Moreover, you’re going to need to enforce some consistency in how things are modeled in order to to promote interoperability. Maybe a regular pattern for statements. Referring to things at multiple places in the same document. Lexical interpretation of datatypes. Domain/range constraints. Relation cardinality constraints.

At the end, I bet what you end up with is a lot like RDF, RDF schema and OWL.

Or perhaps it’s not.

But I’m in agreement with Danny, here: structured information is better than unstructured, and RDF is a simple and compelling way to structure data.

Alternatives?