Internet R/Evolution & Taxonomy as the Glue for the Intranet

October 31, 2007 Mark Cahill Comments 2 comments

(I wrote the text below, then came across (another) great Mike Wesch video that I think more captures the essence of what I’m trying to say…probably better than my own words.)

Information R/Evolution

Last week I mentioned in two separate posts that the single biggest complaint for Intranets is “I can’t find what I need.” Obviously that points to a need for a strong search function, but when we dig a little deeper, there’s a much better solution: taxonomy.

You see the problem with search is this: if you have a massive library of documents, it’s very hard to search effectively. It’s not that a simple query doesn’t get results, its that it gets too many.

“Congratulations, we’ve found 347,234.5 documents that meet your criteria.”

Taxonomy allows us represent out data in a manner that makes more sense, a manner that has intelligence and logic behind it.

Stepping back, let’s take a look at the Wikipedia definition of Taxonomy:

Taxonomy is the practice and science of classification. The word comes from the Greek τάξις, taxis, ‘order’ + νόμος, nomos, ‘law’ or ‘science’. Taxonomies, or taxonomic schemes, are composed of taxonomic units known as taxa (singular taxon), or kinds of things that are arranged frequently in a hierarchical structure, typically related by subtype-supertype relationships, also called parent-child relationships. In such a subtype-supertype relationship the subtype kind of thing has by definition the same constraints as the supertype kind of thing plus one or more additional constraints. For example, car is a subtype of vehicle. So any car is also a vehicle, but not every vehicle is a car. So, a thing needs to satisfy more constraints to be a car than to be a vehicle.

Okay, that sounds complex. But the real magic of the thing is this: to work best, at least to start, taxonomies need to be simple. You see my experience in dealing with this stuff is that it’s virtually impossible for a team of people sitting around a conference table drinking green tea and eating donuts to develop a taxonomy that will be all things to an enterprise. The best way in my estimation is to start with basic terms and subterms that describe your information so that you get it into a basic structure. Over time, while working with the taxonomy, the real nature of the thing, and the real needs of the data will be exposed.

But wait, there’s more! Just because we think that a simple structure is best, that doesn’t mean that we can’t provide a means to allow users to enrich and enlighten. As Jeremy Liew suggests in his recent post, there are a couple of ways to expand upon the taxonomy, by relying on user-generated structure.

His items (excerpted in part, with my edit for space…

Tagging is the first approach, and its use has been endemic to web 2.0. Sometimes the tagging is limited to the author of the content, and other times any user can add tags to create a folksonomy.
The second approach is to solicit structured data from users.
The third approach to user generated data is the traditional approach to the Semantic Web.
- He quotes Alex Iskold on the semantic web:

The core idea is to create the meta data describing the data, which will enable computers to process the meaning of things. Once computers are equipped with semantics, they will be capable of solving complex semantical optimization problems.

Personally, I think the real solution for us is a mixture of item #1 and item #2. I’ve never been convinced of the capability of computers to make value judgements on content, and the best I’ve ever seen accomplished in auto-tagging schemes was approximately 70%, and that was in a controlled environment. As I’ve said before, while my computer may have gotten faster and has more memory over the years, I don’t see how it has necessarily gotten much smarter.

The term many are affixing to this is “Folksonomy.” Again, Wikipedia:

Folksonomy (also known as collaborative tagging , social classification, social indexing, social tagging, and other names) is the practice and method of collaboratively creating and managing tags to annotate and categorize content. In contrast to traditional subject indexing, metadata is not only generated by experts but also by creators and consumers of the content. Usually, freely chosen keywords are used instead of a controlled vocabulary.^[1]

By allowing our users to embellish the data via comments, reviews, ratings, etc., we’re finding out more information about our data. Additionally, by allowing our users to tag items, especially if we do it free form, as with Del.icio.us, they’re able to build their own construct over the data, a construct that they can share with others, and a construct which we can reuse later.

Okay, let me bring this home:

Data on our intranets needs a simple, but still rich and utile, structure.
Simple will work where complex will fail.
We need to plan for new ways in which our users can interact with our data.
By allowing users to interact, we’re allowing them to increase the value of our information
By allowing users to interact, we’re encouraging them to share our data.
By allowing users to interact, we’re making our work easier.

Basically, we will involve users in making our data more useful. As they say, many hands make light work, and in my experience, structure works best when it’s been evolved through use, not dictated from on high.

Reading on Taxonomy:

8 Steps to Creating a Taxonomy – From the Information Management Journal (note that it takes 12 pages to explain the 8 steps…)

2 thoughts on “Internet R/Evolution & Taxonomy as the Glue for the Intranet”

Mukund Mohan says:

October 31, 2007 at 10:08 am

2 problems with Intranets, which I am not sure tagging solves the problem.

1. too much data is still in people’s hard drives and there are not enough incentives to move it to the intranet (hoarding of information to feel important)

2.people still want information to “find them” than they find information. RSS syndication along with “persistent search” will help solve that. Autommatic Tagging may do that based on the “contents” of the document, PPT etc.

Mark Cahill says:

October 31, 2007 at 10:21 am

Great points!

1. Incentives definitely are needed to get people to decrease the hording. I think that by moving to project pages, product pages, and even personal pages, people will become more likely to upload. If it’s a project document, you’d be forced…

2. RSS and persistent search are great ways to reach out. Also use of internal feeds ala Del.icio.us can help to find users. It’s the fool who doesn’t watch his bosses tag feed…;-)

All Things Cahill

The online home for Mark Cahill, and indeed, all things Cahill!

Internet R/Evolution & Taxonomy as the Glue for the Intranet

October 31, 2007 Mark Cahill Comments 2 comments

2 thoughts on “Internet R/Evolution & Taxonomy as the Glue for the Intranet”

Leave a Reply Cancel reply