Oz-IA 2007 — Semantic Analysis in IA
24 September, 2007While I’ve blogged about semantic analysis before, I figured it was only polite to throw up my presentation onto slideshare for everyone to view.
Enjoy
M
While I’ve blogged about semantic analysis before, I figured it was only polite to throw up my presentation onto slideshare for everyone to view.
Enjoy
M
Oz-IA 2007 is coming soon — September 22nd and 23rd, in Sydney, Australia. With a little nudge from Andrew Boyd, I’ve decided to submit an abstract on my work on semantic analysis of medical restrictions text.
This will be the first time I’ve spoken at a conference, so I’m really looking forward to it.
This work has come some way since that post. I’ve now completed a series of prototype wireframes and presented them to our project’s leaders to gauge their opinion on turning the analysis into a tool for creating codified text in the future. You’ll be able to see these wireframes at Oz-IA.
The suggested interface breaks the creation of medical restrictions into discrete sections. For example: who can prescribe the drug; what they have to do to prescribe it; who can they prescribe the drug to and for what condition; etc. Each of these sections are then given a set of parameters — phrases made up of key nouns, verbs, adjectives, and adverbs — informed by the semantic analysis.
The result for each section is sort of like those sets of fridge magnets with words, designed to be rearranged to form a sort of magnetic poetry.

This sort of interface will allow users to easily create the phrases they need to describe the restrictions for prescription, in a way that will be easily coded by the underlying system, and that reflects the way that users currently think about medical restrictions.
It’ll be very exciting to show this work in Sydney. I hope you can join me.
M
It seems like such a long time ago that I was at university. I studied psychology at the University of Newcastle and figured that I could do with a non-science subject. I chose linguistics. I figured it was just a fun subject. Sure, people who did psych and linguistics moved into post-graduate studies in speech pathology, but that’s not what I wanted to do. I wanted to do organisational psychology.
A nasty problem came to my attention recently. The Powers-That-Be needed someone to turn scary-bad medical restriction text into something machine-usable. This text is the checklist a doctor needs to go through to ensure his patient meets all of the qualifications in order to receive a prescription for the drug.
The language of this text can be very complex with different people writing in slightly different ways with different styles. To make things more complex, the format, structure, and style had evolved over time so different medicines contained slightly different ways of presenting almost the same type of information.
If you look over on the pbs.gov.au website for the medicine called etanercept you’ll begin to understand some of the scope of the issue.
Yes, it’s a mess and it looks like chaos personified…
….I think you’re now supposed to hear an evil laugh in the distance…
No one knew what to do with the restrictions text — how to improve the process that creates them, how to analyse and categorise the content, or how to codify it. However, I figured that it wouldn’t be that bad. One person was stunned when I began to talk this way.
“What language are you going to use to codify the restrictions?!?!”, he gasped.
“English!”, I replied.
I think he thought I was joking.
What I meant was that I could analyse the text using an analysis technique from my linguistics days. I knew that all language was broken down into three things: the subject, the verb, and the object.

Even when a sentence doesn’t appear to contain each of these things, they are there, they’re just implied, and because we’re native speakers of the language we don’t actually need to see them (or hear them) in order to understand the meaning of the sentence. For example, if I were to say:
The apple is red.
What I really mean by this sentence is:
The apple is a red apple.
This type of study of structure and meaning of sentences is called semantic analysis.
Computers usually have a tough time examining and making sense of language because everyone uses a different style of language in different situations even when the meaning is the same. My job, therefore, was to put a human-eye over the way in which the medical restriction text was written, put the sense back into the text, and develop rules or a taxonomic framework by which the restrictions could be written in the future so the wording could be more easily plopped into your friendly neighbourhood database.
So how do you do semantic analysis? You need to know a few things:
While I’m not going to go into great detail about these things, suffice to say, with the first two, you can create trees that reflect how nouns, verbs, adverbs and adjectives group together.

You don’t really need to know the meaning of any of the words - the placement of the words in the sentence structure will tell you what lego brick was used. There are even some tools that will help with this job that even linguists will get excited about.
Once you identify the nouns you can use them to create a taxonomy. Once you work out the groupings of words you can create a business process to create those groupings. Once you work out both of these you can create a framework (like the mind map below) that will help users recreate restrictions text that is driven by process, rather than individual whims, will give all restrictions text structure, and will be infinitely more configurable!
The obvious next step, is to take this information, and the framework, and build a tool that will help users to create restrictions text.
You can use this same approach for any set of free-text. Analyse the sentence, look for the nouns, group the nouns using card sorting and viola! - you have a taxonomy. If this is web content you’re dealing with, use this approach for your content audit (determining what sort of content you have) and use the taxonomy for your new way of navigating that content.
Semantic analysis is also good because Google uses it. Well, that is, Google uses semantic analysis to index content. If you know how to use semantic analysis to improve the quality of your content (by improving its structure) then you can increase the rank relevance of your content. Lots of enterprise search engines also use semantic analysis. Even translation programs are starting to use it. All of these applications don’t care about the meaning of the sentence, rather they look for the structural patterns inherent in the language and make some judgements about the lego bricks (is this a noun, verb etc) that make up the sentence. The take home message here is that:
if you write well, then search engines will be more accurate when indexing content and searching through it.
Finally, to do semantic analysis you need some good tools. Here’s my top five:
M