Information classification (Part III) - Folksonomies

18 March, 2007

In two of my previous posts I talked about taxonomies and folk taxonomies. These two are methods of classifying information, the former being highly structured and the latter being more of a method of information classification that is embedded in a specific group’s culture. In this post I’m going to look at a method of information classification that has no inherent structure - folksonomies.

So what is a folksonomy? In essence, it’s the product of free text tagging done predominantly on the web. What I particularly like about them is their user-driven approach to organising information. Unlike a taxonomy is has no real structure and the meaning of the words are specific to the author who writes the tag.

Sites with folksonomies include two basic capabilities: they let users add “tags” to information and they create navigational links out of those tags to help users find and organise that information later.

I’m just going to use two examples, Flickr and del.icio.us, but there are others like blogging, that use the same approach. In fact, this is a big growth area in the web at the moment, so take note – because very soon (IMHO) everyone will want to move in this direction in some way or other!

Flickr

Essentially, Flickr is a website for sharing photos. Users upload photos, classify the photos in their own words by using keywords or tags. This method of classifying information then allows others to search or browse using those tags.

Del.icio.us

Delicious is, essentially, an online bookmark repository - very handy if, like me, you use different computers and have trouble keeping all your bookmarks synchronised. Bookmarks are categorised by the individual with tags. I can see all my tags and bookmarks and also what information other people are tagging and bookmarking. If I’m interested in knowledge management and use the tag ‘km’ then I can use delicious to see what everyone else is tagging with km. That saves me doing google searches and finding what’s interesting and worth reading because otehrs have already done this for me!

What do flickr and delicious have in common?

Sites that use tags build their classification based on the ways their users think about their own information. As such, no one is prescribing how they should classify their own stuff, instead, over time, you get a picture of how the users collectively think about their information.

When to use a folksonomy

Some people make the mistake of trying to build a taxonomy when they don’t have a complete picture of the information that needs classifying. What they end up with is an incomplete way of organising their information. When new information comes in it’s likely they’ll have to try and redesign their taxonomy. The advantage with using a folksonomy is that you don’t need to know everything before you start.

If you want ot hand classifying of information over to your users, allowing them to describe things in their own terms means greater take-up. The last thing they want to do is try and use someone else’s world-view to classify and organise their own stuff.

So, if you trust your users enough to allow them to classify their own information, have the right tools to support it, then folksonomies could be for you.

Business benefits

As an knowledge/information management consultant, I tend to talk about folksonomies to business as having the following benefits:

  • Low cost of development – people just start tagging
  • Low cost of infrastructure – uses existing low-cost, open source and often free internet technologies
  • No specialist taxonomists required – elimination of bureaucracies of cataloguers and indexers [1]
  • Increase in take-up – because you can organise and re-organise things your own way you can tag things the way you to, to suit how you work
  • No learning curve (it’s easy) – you don’t have to learn someone else’s view of the world. Users tag things the way that they see the universe, not the way that someone else sees the universe making the author the authority when it comes to what he intended his work to be about.
  • Highly adaptive – because the user decides what to tag something and can use multiple tags
  • Doesn’t reinvent the wheel – users can make use of folk taxonomies or the shared vocabulary of existing social networks. You may even start to use tags via offering lists of existing terms in folk taxonomies. An employee-generated folksonomy could therefore be seen as an “emergent enterprise taxonomy”.

Some folksonomy advocates believe that it is useful in facilitating workplace democracy and the distribution of management tasks among people actually doing the work. I just know, as a psychologist, that it can increase acceptance of difference in the workplace – that there is “more than one way to describe the cat” because multitudes of tags are possible and indeed welcome. And then when you look at the folksonomy that has been built, well let’s just say that it can reveals much about the mindset of your users - what they think is interesting, how they think about their own work.

Problems using folksonomies

There are lots of people who think folksonomies are the Spawn of Satan, that they are somehow inferior to ‘real’ classification schemes like taxonomies, it’s unsystematic and, from an information scientist’s point of view, often labeled unsophisticated. Furthermore:

  • It’s imperfect - the tags used probably won’t ever be sufficient for an information classification specialist, like a librarian or a records manager. The tags only represent the authors guess, in his own words, what his content is about
  • Polysemy & synonyms – the tags don’t distinguish between words that have multiple and/or related meanings
  • Meta-noise – some tags might not be relevant to some users
  • Scalability – some suggest it gets messy when it gets big
  • It’s new - people are afraid to try new things sometimes

Folksonomies might be new, but they are quickly growing in populatiry, particularly with users because for once in their lives they have a tool that allows them to describe their own work in their own way. It allows other people to find information by using the tags in the folksonomy as a way of navigating to all the other items with that tag. Yes, they have their weaknesses, but placed site-by-side with a taxonomy, you get a great way of finding information.

In the traditional information classification space (or even information architecture space) we create lots of artefacts like site maps, navigation systems, and taxonomies. In creating all these things we really should be asking ourselveswhether our information models are built on the assumption that a single way to organise things can suit all users, one IA to rule them all, so to speak.

IAs should ask themselves this important question because people often model information without thinking about the way that Joe Citizenthinks about information and the way he’s interested in seeing it. To Joe, information is often about relationships – how bits of information fit with other pieces of information in his universe.

This is a reminder that, as Thomas Vander Wal said, “the assumption that taxonomies are great and help all people find things by providing the authoritative terms is flawed. Taxonomies are always going to be less than perfect (and most often far less than perfect) for helping people find and re-find information they need because of their view of the world” [3]

While taxonomies are still vital though for providing a foundation structure, we need to also make sure we develop solutions that can help people, like Joe Citizen, whose terms and vocabulary are left out of the taxonomy – folksonomies can play nice with taxonomies because at least it allows Joe to see the universe in both his own way and the official corporate way.

But is this enough? Is there anything else we can do in order to classify information so it is findable?

There is a way of classifying information whose underlying principle is about making things ‘findable’ – Topic Maps – and itmight provide a way to integrate all these things and give real meaning to knowledge in the way people, rather than machines, think about the relationships between bits of information.

… and that leads me think I need to write another bog entry – Topic Maps … I guess I’ll write up soon.

M

—-

[1] [2] David Weinberger. “Tagging and why it matters.” Retrieved November 10, 2006 from <http://cyber.law.harvard.edu/home/2005-07>.

3] Thomas Vander Wal. Beneath the Metadata: Some Philosophical Problems with Folksonomy.


Information classification (Part II) - Taxonomy

4 February, 2007

A few weeks ago, I created a post on information classification - folk taxonomy flavour. This time around, I’m going to talk about a bugbear for a few people, taxonomies.

Taxonomies

In general, Taxonomies are different from Folk Taxonomies because they claim to be disembedded from social relations and are thus objective and universal.

Again, rather than attempt to describe a taxonomy, let’s first look at some examples that call themselves taxonomies:

  • Dewey Decimal classification system (DDC)
  • Library of Congress
  • Alpha taxonomy
  • Keyword AAA
  • Taxonomy of Educational Objectives
  • Corporate taxonomy

Dewey decimal classification system (DDC)

Good ol’ Dewey is generally about books - a proprietary system of library classification with mutually exclusive categories. This works well for books because you can only really find them in one place (being a physical object and all). It has high level categories with what it calls recursive links to lower level categories.

000 – Computer science, information, and general works
100 – Philosophy and psychology
200 – Religion
300 – Social sciences
400 – Language
500 – Science
600 – Technology
700 – Arts and recreation
800 – Literature
900 – History and geography
1337 - Internet Information

One problem that stuck me hard about Dewey was the way it classifies religion. 200-289 largely represents Christianity, leaving 289+ for the ‘others’!? No wonder we have trouble with the other Abrahamic faiths of the world. This is obviously the world view of the West for the West.

And what happens if we discover other worlds? Dewey only uses 999 to cover ‘Extraterrestrial worlds’ for history and geography. One number is supposed to cover all that!?

Did someone screw up their taxonomy design process when they put this together? Well, both yes and no. Dewey was created in 1876 [1], so people’s world-view was quite different then. Over 100 years later, the taxonomy is starting to show that the number and variety of things it is supposed to classify has now changed.

Overall, Dewey is widely considered to be theoretically inferior to other more modern systems which make freer use of alphabetical characters to produce shorter classmarks for concepts of equal complexity, though it continues to offer a more expressive format than the Library of Congress Classification developed shortly afterward.

Library of Congress

This taxonomy is similar to Dewey Decimal with a number of high-level categories:

A. General Works
B. Philosophy. Psychology. Religion
C. Auxiliary Sciences Of History
D. History: General And Old World
E. History: America
F. History: America, Local
G. Geography. Anthropology. Recreation
H. Social Sciences
J. Political Science
K. Law
L. Education
M. Music And Books On Music
N. Fine Arts
P. Language And Literature
Q. Science
R. Medicine
S. Agriculture
T. Technology
U. Military Science
V. Naval Science
Z. Bibliography. Library Science. Information Resources (General)

Two whole categories, E-F, just for American history, yet only one for everyone else? … makes me wonder.

Alpha taxonomy

Taxonomy, sometimes alpha taxonomy, is the science of finding, describing and naming organisms.

The taxonomy used is Linnean Taxonomy with 7 levels - Kingdom, Phylum, Class, Order, Family, Genus, and Species.

  1. Kingdom: Animalia (with eukaryotic cells having cell membrane but lacking cell wall, multicellular, heterotrophic)
  2. Phylum: Chordata (all animals with a notochord)
  3. Class: Mammalia (vertebrates with mammary glands that in females secrete milk to nourish young, hair, warm-blooded, bears live young)
  4. Order: Primates (collar bone, eyes face forward, grasping hands with fingers, two types of teeth: incisors and molars)
  5. Family: Hominidae (upright posture, large brain, stereoscopic vision, flat face, hands and feet have different specializations)
  6. Genus: Homo (s-curved spine, “man”)
  7. Species: Homo sapiens (high forehead, well-developed chin, skull bones thin)

Keyword AAA

Keyword AAA is known as the Functions Thesaurus (even though it doesn’t conform to the ISO thesaurus standard). It claims to be a thesaurus because it uses references to broader and narrower terms, but is in reality a taxonomy.

It’s used by NSW government and Federal government departments as a method of classifying administrative records of a government organisation. Again, it is hierarchical in structure, again with mutually exclusive categories.

It has three levels:

Keyword : Activity descriptors : Subject descriptors

An example of which might be:

FLEET MANAGEMENT : ACCIDENTS : Accident Report Forms

Ever tried to get a normal public servant to work this scheme out and classify their own records? It’s almost impossible because they just don’t think like this. They really only think in terms of ‘my stuff’ and how ‘my stuff relates to my work’. I think only records managers really understand this classification system.

Taxonomy of Educational Objectives

Now we come to Bloom’s Taxonomy, named after Benjamin Bloom(1956) - an educational psychologist at the University of Chicago. This is the classification of the different objectives and skills that educators set for students and divides them into three “domains”:

  1. Affective
  2. Psychomotor, and
  3. Cognitive

Corporate taxonomy

Finally, we come to one of the taxonomies that everyone should recognise - the corporate taxonomy. Organisations use taxonomies, sometimes known as business thesauri or data dictionaries, to classify records, documents, digital assets (things on an intranet, internet or extranet), physical assets, or even processes, at any level of granularity.

Corporate taxonomies are increasingly used in information systems (particularly content management and knowledge management systems), as a way to allow instant access to the right information within exponentially growing volumes of data.

A corporate taxonomy is usually the fruit of a large harmonisation effort involving all the branches in an organisation. It is often developed, deployed and fine tuned over the years, while setting up knowledge management systems, in order to assure the survival and good use of valuable corporate know-how.

What do these things have in common?

So we’ve looked a number of taxonomies - what do they have in common?

  • Standardised naming conventions
  • Highly structured
  • Generally hierarchical
  • Highly specific application (one taxonomy for one group of things)
  • Highly specific audience (content experts)
  • Prescriptive, rather than descriptive
  • Low adaptability (to new concepts)
  • Enables more efficient indexing and searching of content (indexing dogs will index all things about dogs)

Identifying when to use taxonomies

Given these aspects of taxonomies, you’d probably use a taxonomy, rather than, say, a folksonomy, when:

  • No existing internal method of classification within the group is complete or accepted or in use
  • When the ‘things’ to be classified are known to be fixed and finite in number
  • When one-size needs to fit-all
  • Classification schemes like Linnean, for example, works well for scientists because everyone is dealing with the same information: the information about living things.

You could use a taxonomy for:

  • Naming conventions for corporate records and information
  • Naming conventions for documents in a document management system
  • The navigation structure for an intranet or an internet website

Some other things to think about

You could use a folk taxonomy to kick-off work on a taxonomy. I was on a recent engagement with the Office for Aboriginal and Torres Strait Islander Health (OATSIH), a part of the Department of Health located here in Canberra, where I discovered indigenous health services were making reports and using their own words and language to do so. A recommendation I made to OATSIH was to use these terms as the basis of forming their own taxonomy for reporting to the Minister on health expenditure.

Don’t think that because you’ve got a corporate taxonomy that you should have to use it for everything. It makes sense to use different forms and types of information classification for different purposes. A single organisation should probably use one for its intranet, one for its internet, one for recordkeeping. The difficulty is knowing which one to use when and making the links between similar sorts of items.

So, how do I make a taxonomy?

At least, unlike a folk taxonomy, you can make a taxonomy. Firstly, consult your users. Interview them and find out how they work and how they classify their own knowledge artefacts. You could also use a workshop and card sorting method - one that I’m particularly fond of myself.

Make sure you map the processes that create information. You can learn a lot from documenting how people do things and why and seeing what is produced as a result. Don’t forget that an organisation’s website, legislation, reports, reviews, corporate documents, will reveal a lot about the business drivers and the broad groupings that already exist.

Finally, bring all the bits you’ve found together. Group and re-group like-with-like until it makes sense and then ask your users to validate what you’ve done. If it doesn’t make sense to them don’t worry! Just take their comments onboard and start again. It’s for them that you’re creating the taxonomy so you want to make sure it will work for them. If you base your work on what others have told you or what you’ve found they produce they will be more willing to accept the way you group things and the names you give them.

Business benefits

By following these simple guidelines you’ll create a great taxonomy. The business benefits that will follow will be:

  • Standardises things by creating rules for classifying your stuff
  • One-size-fits-all – everyone shares the same classification language
  • There is no ambiguity – everyone has the expectation that Cats are called “cats” and dogs are called “dogs”

Problems with taxonomies

I see a fair few IAs who see an opportunity to make a taxonomy and rush in. I’m not sure that taxonomies solve all problems, particularly when you think about:

  • High cost for development – taxonomies are very expensive to create and maintain, often involving month-long projects by several members of the team. For websites with thousands (or even millions) of pages, this Herculean task is sometimes never complete. As a result, broken taxonomies can remain until the design team attempts a complete redesign. Second, taxonomies may fail to reflect the language of users if they are not fully tested with the target population. This results in a less effective site that leads to user failure, user frustration, or increased support costs.
  • One-size-fits-all – (yes, it’s a blessing and a curse) there’s no room for personal preferences for structuring or classifying information. There’s also no room for grey-areas. Everything must fit into the classificaiton scheme somewhere! In some taxonomies, something can only go into one, not multiple categories.
  • They’re hard to adapt and re-use - every web site contains its own unique information, so there is no single classification scheme that works all the time
  • Can be difficult to incorporate new ideas – as we saw with Dewey, once the set of things the taxonomy classifies changes you’re probably going to have to redo the whole taxonomy to make it comprehensive again.
  • Low tolerance for difference – Do this the corporate way, not your way. the top-down approach of reinforcing the taxonomy may not match the social psychological dynamic of the group. If they are all free-thinking and its generally a flat organisation, telling them to do it a certain way might not match their way of thinking.

…and…

There are some inherent problems in not allowing people to classify information in their own terms. This is the concept of analysis paralysis [2] where users have to shift their own thinking and choose an alien method of classification.

analysis paralysis

What results is often miss-classification of information because users haven’t got a good understanding of the taxonomy - their world view just doesn’t mesh with the corporate world view..

A good change management process will take this issue into account and help users move from an alien way of thinking about classification to one where it is second nature. But sometimes, it’s the cue that a taxonomy is not what you really want or need.

So, if you don’t want a taxonomy, and there’s no existing folk taxonomy, what’s left?

….folksonomies!…

And with that, we’ll finish our look at information classification.

—-

[1]Wikipedia. Dewey Decimal Classification. Online at: http://en.wikipedia.org…_Classification, accessed on 4 February 2007.

[2] Sinha, R. 2005. A cognitive analysis of tagging (or how the lower cognitive cost of tagging makes it popular). 27 Sept. Online at: http://www.rashmisinha.com…cognitive.html, accessed on 4 February 2007.