Turning a Blog Into a Semantic Web

So, I've finally done it. I've turned Ishbadiddle into a semantic web, of sorts. 3.75 years of blogging, 2,341 posts, have all been imported, categorized, and coded. (With help from our posters of course!) There are now 1,385 keywords in our subject index, covering everything from Abu Ghraib to Zombies.

Why bother? Wouldn't it just be easier to leave all the old posts alone? Well, of course it would be easier. But I wanted to fix what I see as a fundamental problem with blogs.

At heart, a blog is just a database. (I am indebted to Sun for our many conversations on this topic that helped push my thinking on blog organization.) URLs, post text, authors, comments -- it's all just data, and in theory we can slice it any way we want to. But most blogs only divide up the information by time. Which is useful, perhaps, for a personal diary -- what was I thinking about last June? -- but for a reader, it's probably the least interesting way of reading. When was the last time you casually read a blog's archives? Generally, if it's not on the front page, it's gone.

Worse yet, everything is in reverse chronological order, leading to what Eric Meyer calls the Memento effect: "Reading a weblog is like watching Memento, which I agree was a cool movie, except all weblogs are like that so it's as if every single movie released in the past seven or eight years was structured exactly like Memento."

This is one reason why we moved Ish over to Movable Type. With the old Blogger system, we were restricted to the reverse-chronlogy diary mode by default. MT lets us slice up Ishbadiddle by Author, by Category, and yes, by Month.

The category system is an essential organizational tool -- if you're just interested in current events, or to catch up on your friends, or read our peculiar cultural views, or our thoughts on technology, or just want to read the wacky stuff -- you can narrow down Ishbadiddle into these thematic chunks. Useful? Sure. But at best, it's only creating "sub-blogs" (sublogs?). These still don't get at the fundamental problem with blogs.

The Fundamental Problem with Blogs

The problem is, that the information in blogs isn't organized in a way that gives context to what's been written. If we're writing about Iraq, or turbans, or Buffy the Vampire Slayer, it helps to know where we've come from. A good blog is a like a cocktail party, minus the drinks and the funny paper umbrellas. But blogs only give you the last statement of the conversation. For those of you who use Gmail, think of the difference between receiving a regular email (where the thread is either truncated or reverse-ordered), and seeing your email presented as a conversation.

One solution would be to increase the number of categories. But for a general purpose blog like Ishbadiddle, that number would get unwieldy. (Did I mention that there are 1,385 keywords?)

Why not just use the Search Function?

The search function assumes that you know that there's something to look for. Plus it's imprecise. A concept may not appear in a post, but be directly relevant to it. Or a word may appear in a post but have nothing to do with it. For instance, you'll find a post on Dog Day Afternoon and Harken Energy in a search for "Dog", but not in the subject listing for "Dogs".

Enter the Semantic Web

My thinking about the Semantic Web was influenced by Paul Ford's piece on the subject, which imagines the power of Google harnessing the Semantic Web to make even more money. There's a good article on the Semantic Web on wikipedia. Basically, it's adding metadata (data about the data) to web pages. In our case, it's simply adding "subject" data to each blog post, and then harnessing that to create an index of posts that relate to that subject. Think of it this way: the Category system is like the Table of Contents of a book, listing chapter headings. The Keyword system is like the Index of a book, one that is constantly updated.

The advantage of such a classification system is that it is open-ended and bottom-up. With 19 different posters here, it would be impossible to impose a complex hierarchical classification system. The data's too unruly. There would be "too many notes."

But for the bottom-up system to work, it needs a few simple rules:

Follow these simple guidelines, and semantic goodness will spread throughout the blog.

Where is this going?

First, the system is entirely too slow. It's using Movable Type's native search function, which is slow to begin with. Ideally we'd create an index which would be much faster, but I lack the programming chops to do this. Second, using the search function gives us some funny results on short keywords -- The Art index also includes posts on Arthur C. Clarke, Art Spiegelman, and Artifical Intelligence. Gotta fix that.

Third, I would really like to have a graphical interface to show the connections between subjects and posts. I've looked into TouchGraph, which has a really neat interface, and can take an XML input. But again with the lack of progamming chops. Fourth, a count of how many posts are assigned to each keyword. Fifth, it would be really cool to geographically map place-related keywords...

But that's all for later. For now, enjoy the blog. Let me know how we can improve the coding. And if you're interested in doing something like this on your own blog, let me know, I'd be happy to help.

Update: I was talking about this with Debbie, who knows a thing or two about archives. She suggested that the full list of keywords is too long to be browsable, but that a list of keywords for each category might be useful for the reader. So, now from the Category list you have the option to either see all the posts, or just browse the subjects covered for that Category. I must say, a brilliant suggestion, and not just because she's my wife.

2nd Update: Peter Merholz argues for "free tagging" as opposed to "inflexible top-down approaches." And LukeW proposes using Tufte's sparklines as a way to visualize the narrative sequence of related blog entries.

3d Update: I added a list of recent subjects to the sidebar. And I put the code recipe up on the Movable Type support forum if you're interested in doing something similar.

One More Thing! The coding also puts the keywords as subject meta-tags on each entry's page, according to the Dublin Core standard.

And again: I've written up a better tutorial on Learning Movable Type.



M E-L posted this on October 22, 2004 3:29 PM

This post is filed under: Blogs & Blogging, Featured Posts, Site News
Comments
Kerim Friedman wrote:

Very exciting. I look forward to seeing how this develops!

Do you know about nu.tritio.us? It is a great way to get encourage del.icio.us users to use similar tags to other users, by showing them existing tags for bookmarked items, as well as highlighting their own more popular tags.

Comment #1 :: link :: October 25, 2004 12:20 AM :: homepage
M E-L wrote:

del.icio.us is intriguing -- but it only encourages subject tagging for links, not for posts. The use of the hyperlink as the coin of the realm is natural, I suppose (see daypop, blogdex, etc.) but I think in some ways it devalues actual writing. In other words, if a blog falls in the forest, but no one links to it, did it make a sound?

Comment #2 :: link :: October 27, 2004 3:48 PM :: homepage
amoeda wrote:

I love what you've done with this data structure! The combination of top-down (filing) and bottom-up (indexing) tagging is the best thing about it--a partial remedy for the freetagging dilemma Peter talks about. And free-associative browsing by related categories is pretty fun. But there could be lots of other interesting ways to relate categories, keywords, posts, people, places and time to each other. A few I'd like to see:

--I'd like to see how my (or some other poster's) interests have evolved over the years, as reflected by participation on Ish. (Of course, this feature requires me to accept a certain amount of dataveillance, so maybe personal interest mapping should be an opt-in thing.) This is a case where a visualization tool could be useful... I must look into this Touchgraph stuff.

--Zeitgeists: Looking at Ish's collective interests over a given span of time; comparisons with other zeitgeists

--A mapping of keywords to the geographic locations of posters (Again with the dataveillance. I'm just a born snoop, I guess.)

Mostly these would not result "useful" improvements to Ish, in the sense of making the content more accessible to the average reader. But it might reveal some interesting trends, or pseudotrends.

What's yours is mined,
--andrea

p.s. A couple more interesting perspectives: David Weinberger's observations on how all data is metadata, and Warren Sack's work on conversation mapping

p.p.s. Pessimistic note: Personally, I tend to reply a lot here and very rarely post anew. This is because I constantly forget my $%@#$!@$ password, but also because I am too lazy to tag my posts and thus prefer glomming on to pre-tagged content. This is why we need smarter machines that will apply contextual cues to make tagging automatic or at least less labor-intensive. Until then, everybody tag!!!

Comment #3 :: link :: October 27, 2004 5:46 PM
M E-L wrote:

You like it! You really really like it! I'm so glad, you being the professional here and all...

Yes, I think there's more interesting data to extract (and visualize) from this. Touchgraph is good for showing links between things, but not very good for showing time (your zeitgeist idea, for instance). Also, as the data currently sits, there's no way to show frequency -- how many posts are associated with a keyword, and when those posts occur. I need someone who can program MySQL to do that; know any programmers?

Oh, and I'd love to map the meatdata to the metadata. Question: should a post's location relate to the poster (Andrea is in Brooklyn) or the subject (Andrea is writing about Iraq)?

I think I've seen that Sack map before. Very cool. Is it something we could adapt somehow?

As to your last point -- if you haven't yet, get a "post to MT Weblog" bookmarklet for your browser. That way when you're on a page, all you have to do is press a button and you can be blogging about it. You can get one here. As for the difficulty of post-tagging, well now that it actually does something you'll want to do it all the time, right? :) No, seriously, you can always go back and tag things later. Don't let that stop you from blogging, Ms. Moed!

Comment #4 :: link :: October 28, 2004 10:42 AM :: homepage
M E-L wrote:

Oh, and for Yet Another Way To Cut The Data, here's how to turn a Movable Type blog into a web forum.

Comment #5 :: link :: October 28, 2004 11:26 AM
amoeda wrote:

More a journeyperson than a professional, but thanks! I can write basic MySQL queries (and I happen to be a close personal wife of someone who can write advanced ones)--we should get together and hash out a spec sometime.

Re: relating to poster's location or post subject location... well, either or both would be cool, keeping in mind that the first is a one-to-one point mapping while the second could be one-to-many region mapping, and thus more complicated to plot.

And on the visualization tip, one more del.icio.us companion widget: extisp.icio.us.

Comment #6 :: link :: October 28, 2004 10:14 PM
M E-L wrote:

Also on the visualiszation-of-blogs front, check out 5 years of plasticbag posts

Comment #7 :: link :: November 5, 2004 12:30 PM
M E-L wrote:

Also on the visualiszation-of-blogs front, check out 5 years of plasticbag posts

Comment #8 :: link :: November 5, 2004 12:32 PM
Sean wrote:

Wow, I am facinated! Baffled but facinated - I am going to need to reread this post and the one on the MT forum a few more (hundred) times but I very much like the ideas of what you are talking about

Comment #9 :: link :: April 1, 2005 3:36 PM :: homepage
M E-L wrote:

Note to spammers who have been hitting this page: the system adds a "nofollow" tag to your links, so Google will ignore them, meaning you are wasting your time. Go away.

Comment #10 :: link :: August 21, 2005 12:02 PM
Post a comment










Type the characters you see in the picture above.

















Ishbadiddle buttonTriptronix buttonMovable Type buttonCreative Commons buttonCSS Tableless buttonNotepad buttonMax Design buttonLogin buttonEmail button

ageless buttonNYC Blogger buttonGeoURL buttonBlogShares buttonTechnorati button

Flying Spaghetti MonsterGet Firefox!Stand up for your rights

Ishbadiddle Full Posts Feed ButtonIshbadiddle Posts Excerpts Feed ButtonBloglines subscribe buttonIshbadiddle LiveJournal Feed Button