March 12, 2004


I've been meaning to post about what I do for a living for a while now. Not because anyone has ever asked, of course, but because they really should and it hurts my feelings that they never do. Actually, that's not true, there's nothing worse than talking to someone and watching their eyes glaze over to the point that you'd like to dip your doughnut in their orbit.

But, here, safely ensconced far away from the glazing or the equally likely furtive glances searching for an escape route, I can write about my job to my hearts content.

But seriously, I'm actually working on some pretty cool stuff that I've wanted to post about for a while. Seriously.

The main research project that I've been working on for a while is on metamodeling and meta-metamodeling. While it can be used for mundane things of course, metamodeling is actually pretty cool. It captures a lot of the interesting parts about recursive and self-referential languages, as well as making you think about ontological frameworks.

So what is metamodeling (and its big brother meta-metamodeling)? Well, there are several relevant standards in this area, including the OMG's Meta-Object Facility (MOF), Unified Modeling Language (UML), and Common Warehouse Metamodel (CWM). But it's simplest to start off with a desciption of data and metadata. Everyone knows what data is (are). It's meaningful information, organized in some fashion, able to be retrieved. Perhaps stored in a database. But for now, let's think of the words of a book. Now everyone is also probably familiar with metadata, even if they are not familiar with that term. Metadata is data about data. So if the books are the data, then the card catalog that describes the books is the metadata. Each piece of metadata (each card in the catalog) describes a book, it's title, author, publisher, etc.

Now, for each card, a library might have multiple copies of the book (and certainly multiple copies exist in the world). Each of these copies is known as an instance of the metadata on the card. Now this is all well and good because we can all see how this is useful. I don't have to go look at the book to get some basic information about it, I can use the catalog and it can tell me if I care about the book or not. We'll just think hard about what information to put on the cards to make it useful.

So you might think that we can stop there. But it soon became obvious that the metadata on the cards was just more data and we might be in the position to want to describe what's on the cards. In fact, we may have two different card catalogs that have different information in them. So one might contain author, publish date, title and the other might have author, editor, page count, publisher. So, if I need to find books with the word "Piddle" in the title that are over 500 pages long, I need to know whether the card catalog has those fields in it before I make the trek to that library.

So now we can imagine a catalog of card catalogs that has a card for each library (and its card catalog), with each card listing the name of the card catalog, its location and the fields that are on its cards. Now we have meta-metadata. We could even get fancy and have a card in this meta-catalog, not for each card catalog, but for each type of card in a catalog. So we'd have a card that described what magazine cards looked like, and one that described books, one for anthologies, etc.

So you can see that we can keep doing this ad infinitum as necessary, and in fact, computer programmers talk about a meta stack. Here the levels are labelled M0 (data), M1 (metadata), M2 (meta-metadata), M3 (meta-meta-metadata).

To make things a bit more complicated, the meta- prefix is really relative, meaning that meta-metadata is both data (seen from its level, M2) and metadata about metadata (seen from the level M1), as well as meta-metadata (from M0).

Each object in a layer is considered an instance of an object in the layer above. Each book an instance of a card. Each card an instance of the meta-card that describes it.

To make things more confusing (but no more complicated), computer programmers sometimes use the word model to mean the same thing as metadata, so you might call the meta-metadata a metamodel instead. Sometimes (see UML above) they will also call a metamodel a modeling language since it describes models.

Now, it turns out that you don't need to actually build a stack that's infinitely high. In fact, most people stop at M3 – the meta-metamodel layer. And the reason for this is that they define the meta-metamodel in such a way that it defines metamodels. And then they define the meta-metamodel itself as an instance of itself, recursively. This is where it gets complicated, and you (at least I) start getting confused.

Anyway, more on this later....

Posted by richard at March 12, 2004 10:03 PM

Can you have better meta-data about your postings so that I don't have to read three paragraphs in to find out that you're just rambling on about metadata?

Posted by: Mike F. at March 14, 2004 05:55 PM

Question: why does there have to be three levels? If you're going to go recursive, you just need two levels (data and metadata.) Isn't this the way that XML works?

Posted by: Michael Weiksner at March 16, 2004 04:22 PM

Weiksner, I think you are right, theoretically. All you need is data and metadata. But it helps to go higher if you want to be practical.

MOF is a fairly simple meta-model which is designed specifically for modeling meta-models (including itself). It's simple enough to have a defined set of semantics that allows tools to use it to instantiate any model described by it (including UML). On the other hand, it's too simple to fully model any given domain, since it has limitations that make it easier to implement.

So, MOF is a metamodel (and also a meta-metamodel) that is designed to describe metamodels. UML (a metamodel described by MOF) is a much more complicated metamodel optimized for building object-oriented computer programs. A specific object model (say for a word processor) captures the specifics of a particular business process or problem, and has specific semantics again. An instantiated object (say a Paragraph object in the word processor) has specific data relevant to that problem domain.

So, it's mostly practicality that drives you to four levels, not theory.

Posted by: richard at March 16, 2004 07:29 PM

I rarely get a chance to say this, because it's really a relative term, but you guys are dorks.

Posted by: Mike F. at March 17, 2004 07:16 PM

Careful, Mr. Mathematician, I've seen the pictures of your "method" on your web site.

Posted by: richard at March 17, 2004 08:30 PM