by Paul Rudo on 09/07/12 at 8:19 pm
My wife is a doctor, a poet, and a warrior.
Of course, she is neither of those things. But just by reading this, you were able to infer what I meant by those statements. This intuition and ability to “read between the lines” is a core part of human intelligence which has been very difficult to consistently simulate using efficient computer algorithms.
In the early days of the Internet, search engines would help users locate web sites based on keyword density. This approach had a number or crucial flaws which made it difficult to maintain high-quality search results:
- Publishers of web sites had to pick their wording carefully, in order to ensure that the text of the web site contained every possible permutation that a person might type for information on a specific topic.
- Publishers had to make complicated modifications to their web site in order to comply with the search engines’ algorithms. This includes carefully placing appropriate keywords in the title, image descriptions and meta tags.
- Internet spammers took advantage of these algorithms to create misleading promotional pages which would rank higher in the search engines than high-quality traffic.
Google aimed to solve this problem by ranking web pages on external social factors – such as inbound links and anchor text – which were harder to fake. Of course, even this approach had its flaws.
- A consumer looking for “low-cost Kleenex” may be a good fit for a store selling “generic paper towels”, but the keywords would be poorly matched.
- SEO spammers created complex link spam networks which were often hard to detect, and placed them at an unfair advantage against less aggressive legitimate businesses.
As a result, Google (and other search engines) has gradually upgraded its algorithm to maximize and fine-tune results for the optimum relevancy. And one of the most important sets of tools used by these search engines involves “latent semantic indexing”, which can understand and interpret the language and context of the page being analyzed.
The ability for computers to understand and interpret language has important implications which go well beyond search. These capabilities will become very important when it comes to business intelligence, big data, electronic discovery, counter-terrorism, customer-facing self-service applications, and other areas.
The importance of language interpretation was seen in IBM’s Watson Jeopardy challenge, where the system had to interpret complex trivia questions which were crafted using puns, plays on words, and double-entendres.
Israel’s ADAMA research project – backed by $1.4 million in funding – will be working on the complex challenge of helping computers better-interpret metaphors used in language.
A metaphor takes 2 things which are different, and compares them using their similar attributes. When you make a metaphor, you are saying that the target shares the qualities of the source.
When you say that “this steak is charcoal”, you are saying that it is hard, black, and very well-cooked.
Metaphor analysis is very complex because the computer has to understand all of the attributes associated with specific words, in order to compare multiple terms and link their common attributes.
Another layer of complexity that comes with metaphor analysis has to do with the fact that culture can be a significant influence on their meanings. In North America, calling someone a “chicken” implies that they are cowards. But in South America, calling someone a “chicken” implies that they are promiscuous.
Some metaphors also exist within subcultures, and others even exist only between certain individuals. So if we are able to break the code and develop a consistent, automated means of detecting and interpreting metaphors, it could have some important practical uses.