Information classification (Part II) - Taxonomy

A few weeks ago, I created a post on information classification - folk taxonomy flavour. This time around, I’m going to talk about a bugbear for a few people, taxonomies.

Taxonomies

In general, Taxonomies are different from Folk Taxonomies because they claim to be disembedded from social relations and are thus objective and universal.

Again, rather than attempt to describe a taxonomy, let’s first look at some examples that call themselves taxonomies:

  • Dewey Decimal classification system (DDC)
  • Library of Congress
  • Alpha taxonomy
  • Keyword AAA
  • Taxonomy of Educational Objectives
  • Corporate taxonomy

Dewey decimal classification system (DDC)

Good ol’ Dewey is generally about books - a proprietary system of library classification with mutually exclusive categories. This works well for books because you can only really find them in one place (being a physical object and all). It has high level categories with what it calls recursive links to lower level categories.

000 – Computer science, information, and general works
100 – Philosophy and psychology
200 – Religion
300 – Social sciences
400 – Language
500 – Science
600 – Technology
700 – Arts and recreation
800 – Literature
900 – History and geography
1337 - Internet Information

One problem that stuck me hard about Dewey was the way it classifies religion. 200-289 largely represents Christianity, leaving 289+ for the ‘others’!? No wonder we have trouble with the other Abrahamic faiths of the world. This is obviously the world view of the West for the West.

And what happens if we discover other worlds? Dewey only uses 999 to cover ‘Extraterrestrial worlds’ for history and geography. One number is supposed to cover all that!?

Did someone screw up their taxonomy design process when they put this together? Well, both yes and no. Dewey was created in 1876 [1], so people’s world-view was quite different then. Over 100 years later, the taxonomy is starting to show that the number and variety of things it is supposed to classify has now changed.

Overall, Dewey is widely considered to be theoretically inferior to other more modern systems which make freer use of alphabetical characters to produce shorter classmarks for concepts of equal complexity, though it continues to offer a more expressive format than the Library of Congress Classification developed shortly afterward.

Library of Congress

This taxonomy is similar to Dewey Decimal with a number of high-level categories:

A. General Works
B. Philosophy. Psychology. Religion
C. Auxiliary Sciences Of History
D. History: General And Old World
E. History: America
F. History: America, Local
G. Geography. Anthropology. Recreation
H. Social Sciences
J. Political Science
K. Law
L. Education
M. Music And Books On Music
N. Fine Arts
P. Language And Literature
Q. Science
R. Medicine
S. Agriculture
T. Technology
U. Military Science
V. Naval Science
Z. Bibliography. Library Science. Information Resources (General)

Two whole categories, E-F, just for American history, yet only one for everyone else? … makes me wonder.

Alpha taxonomy

Taxonomy, sometimes alpha taxonomy, is the science of finding, describing and naming organisms.

The taxonomy used is Linnean Taxonomy with 7 levels - Kingdom, Phylum, Class, Order, Family, Genus, and Species.

  1. Kingdom: Animalia (with eukaryotic cells having cell membrane but lacking cell wall, multicellular, heterotrophic)
  2. Phylum: Chordata (all animals with a notochord)
  3. Class: Mammalia (vertebrates with mammary glands that in females secrete milk to nourish young, hair, warm-blooded, bears live young)
  4. Order: Primates (collar bone, eyes face forward, grasping hands with fingers, two types of teeth: incisors and molars)
  5. Family: Hominidae (upright posture, large brain, stereoscopic vision, flat face, hands and feet have different specializations)
  6. Genus: Homo (s-curved spine, “man”)
  7. Species: Homo sapiens (high forehead, well-developed chin, skull bones thin)

Keyword AAA

Keyword AAA is known as the Functions Thesaurus (even though it doesn’t conform to the ISO thesaurus standard). It claims to be a thesaurus because it uses references to broader and narrower terms, but is in reality a taxonomy.

It’s used by NSW government and Federal government departments as a method of classifying administrative records of a government organisation. Again, it is hierarchical in structure, again with mutually exclusive categories.

It has three levels:

Keyword : Activity descriptors : Subject descriptors

An example of which might be:

FLEET MANAGEMENT : ACCIDENTS : Accident Report Forms

Ever tried to get a normal public servant to work this scheme out and classify their own records? It’s almost impossible because they just don’t think like this. They really only think in terms of ‘my stuff’ and how ‘my stuff relates to my work’. I think only records managers really understand this classification system.

Taxonomy of Educational Objectives

Now we come to Bloom’s Taxonomy, named after Benjamin Bloom(1956) - an educational psychologist at the University of Chicago. This is the classification of the different objectives and skills that educators set for students and divides them into three “domains”:

  1. Affective
  2. Psychomotor, and
  3. Cognitive

Corporate taxonomy

Finally, we come to one of the taxonomies that everyone should recognise - the corporate taxonomy. Organisations use taxonomies, sometimes known as business thesauri or data dictionaries, to classify records, documents, digital assets (things on an intranet, internet or extranet), physical assets, or even processes, at any level of granularity.

Corporate taxonomies are increasingly used in information systems (particularly content management and knowledge management systems), as a way to allow instant access to the right information within exponentially growing volumes of data.

A corporate taxonomy is usually the fruit of a large harmonisation effort involving all the branches in an organisation. It is often developed, deployed and fine tuned over the years, while setting up knowledge management systems, in order to assure the survival and good use of valuable corporate know-how.

What do these things have in common?

So we’ve looked a number of taxonomies - what do they have in common?

  • Standardised naming conventions
  • Highly structured
  • Generally hierarchical
  • Highly specific application (one taxonomy for one group of things)
  • Highly specific audience (content experts)
  • Prescriptive, rather than descriptive
  • Low adaptability (to new concepts)
  • Enables more efficient indexing and searching of content (indexing dogs will index all things about dogs)

Identifying when to use taxonomies

Given these aspects of taxonomies, you’d probably use a taxonomy, rather than, say, a folksonomy, when:

  • No existing internal method of classification within the group is complete or accepted or in use
  • When the ‘things’ to be classified are known to be fixed and finite in number
  • When one-size needs to fit-all
  • Classification schemes like Linnean, for example, works well for scientists because everyone is dealing with the same information: the information about living things.

You could use a taxonomy for:

  • Naming conventions for corporate records and information
  • Naming conventions for documents in a document management system
  • The navigation structure for an intranet or an internet website

Some other things to think about

You could use a folk taxonomy to kick-off work on a taxonomy. I was on a recent engagement with the Office for Aboriginal and Torres Strait Islander Health (OATSIH), a part of the Department of Health located here in Canberra, where I discovered indigenous health services were making reports and using their own words and language to do so. A recommendation I made to OATSIH was to use these terms as the basis of forming their own taxonomy for reporting to the Minister on health expenditure.

Don’t think that because you’ve got a corporate taxonomy that you should have to use it for everything. It makes sense to use different forms and types of information classification for different purposes. A single organisation should probably use one for its intranet, one for its internet, one for recordkeeping. The difficulty is knowing which one to use when and making the links between similar sorts of items.

So, how do I make a taxonomy?

At least, unlike a folk taxonomy, you can make a taxonomy. Firstly, consult your users. Interview them and find out how they work and how they classify their own knowledge artefacts. You could also use a workshop and card sorting method - one that I’m particularly fond of myself.

Make sure you map the processes that create information. You can learn a lot from documenting how people do things and why and seeing what is produced as a result. Don’t forget that an organisation’s website, legislation, reports, reviews, corporate documents, will reveal a lot about the business drivers and the broad groupings that already exist.

Finally, bring all the bits you’ve found together. Group and re-group like-with-like until it makes sense and then ask your users to validate what you’ve done. If it doesn’t make sense to them don’t worry! Just take their comments onboard and start again. It’s for them that you’re creating the taxonomy so you want to make sure it will work for them. If you base your work on what others have told you or what you’ve found they produce they will be more willing to accept the way you group things and the names you give them.

Business benefits

By following these simple guidelines you’ll create a great taxonomy. The business benefits that will follow will be:

  • Standardises things by creating rules for classifying your stuff
  • One-size-fits-all – everyone shares the same classification language
  • There is no ambiguity – everyone has the expectation that Cats are called “cats” and dogs are called “dogs”

Problems with taxonomies

I see a fair few IAs who see an opportunity to make a taxonomy and rush in. I’m not sure that taxonomies solve all problems, particularly when you think about:

  • High cost for development – taxonomies are very expensive to create and maintain, often involving month-long projects by several members of the team. For websites with thousands (or even millions) of pages, this Herculean task is sometimes never complete. As a result, broken taxonomies can remain until the design team attempts a complete redesign. Second, taxonomies may fail to reflect the language of users if they are not fully tested with the target population. This results in a less effective site that leads to user failure, user frustration, or increased support costs.
  • One-size-fits-all – (yes, it’s a blessing and a curse) there’s no room for personal preferences for structuring or classifying information. There’s also no room for grey-areas. Everything must fit into the classificaiton scheme somewhere! In some taxonomies, something can only go into one, not multiple categories.
  • They’re hard to adapt and re-use - every web site contains its own unique information, so there is no single classification scheme that works all the time
  • Can be difficult to incorporate new ideas – as we saw with Dewey, once the set of things the taxonomy classifies changes you’re probably going to have to redo the whole taxonomy to make it comprehensive again.
  • Low tolerance for difference – Do this the corporate way, not your way. the top-down approach of reinforcing the taxonomy may not match the social psychological dynamic of the group. If they are all free-thinking and its generally a flat organisation, telling them to do it a certain way might not match their way of thinking.

…and…

There are some inherent problems in not allowing people to classify information in their own terms. This is the concept of analysis paralysis [2] where users have to shift their own thinking and choose an alien method of classification.

analysis paralysis

What results is often miss-classification of information because users haven’t got a good understanding of the taxonomy - their world view just doesn’t mesh with the corporate world view..

A good change management process will take this issue into account and help users move from an alien way of thinking about classification to one where it is second nature. But sometimes, it’s the cue that a taxonomy is not what you really want or need.

So, if you don’t want a taxonomy, and there’s no existing folk taxonomy, what’s left?

….folksonomies!…

And with that, we’ll finish our look at information classification.

—-

[1]Wikipedia. Dewey Decimal Classification. Online at: http://en.wikipedia.org…_Classification, accessed on 4 February 2007.

[2] Sinha, R. 2005. A cognitive analysis of tagging (or how the lower cognitive cost of tagging makes it popular). 27 Sept. Online at: http://www.rashmisinha.com…cognitive.html, accessed on 4 February 2007.

7 Responses to “Information classification (Part II) - Taxonomy”

  1. Topic maps - a new super hero is born! « Matt’s Musings Says:

    [...] inside their heads - that is, in terms of relationships, rather than a strict hierarchy (a taxonomy) or an unstructured cloud of tags [...]

  2. Taxonomies are not the enemy at Facibus Reviews Says:

    [...] Hodgson has a good article on Taxonomy as a form of information classification that puts an Information Architecture slant on  [...]

  3. Taxonomies are not the enemy at Facibus Reviews Says:

    [...] Hodgson has a good article on Taxonomy as a form of information classification that puts an Information Architecture slant on [...]

  4. Semantic analysis: Making sense of the chaos of free text « Matt’s Musings Says:

    [...] How to create a taxonomy  [...]

  5. Where world-views collide — social computing tools in government « Matt’s Musings Says:

    [...] I’d find people would just use a taxonomy to classify information and then use normal document management system to store and report on the [...]

  6. I’m a schizophrenic, and so am I « Matt’s Musings Says:

    [...] find missunderstandings quite typical of terms created for use taxonomies. Taxonomies carry with them the implicit assumption that the terms created are for specific uses, [...]

  7. Information classification (Part I) - Folk Taxonomy « Matt’s Musings Says:

    [...] Next bat time - taxonomies. [...]

Leave a Reply