Microsoft in its fight for search market share has finally acquire Powerset (a natural language processing system). The system is really impressive and could be a real value for users. For example if you type in ‘How tall is the Eiffel Tower’ you will have the answer: 300.65 Meters
According to an interview in TechCrunch the Microsoft search team will implement Powerset’s functionalities in their search engine (Live.com) by the end of this year.
Below is a simple explanation of how it works (by Barney Pell):
[…]I guess one way to think about it is like when you are learning how to diagram sentences in elementary school. You draw these trees of a sentence and find here is the noun phrase and a noun phrase has a determiner like “the” and then it has a noun like “dog” and here is a verb phrase, and it might have a verb like “barks” and then what does it mean for the that word, bark is a verb and it has a “S” at the end and the way that it works, which we call morphology, that’s the present tense of that verb. And then the whole sentence is composed of those pieces, and so the meaning is built out of those. So you draw these diagrams when you are learning how to do it. And the kind of knowledge that’s in a natural language processing system like Powerset is using is sort of like that. Its basically extracting out both the surface structure, that kind of a tree structure of a sentence, and then its converting that into a series of different representation, ultimately into one which expressing that thing in fact. So it will basically say that there is a kind of activity here and it is a barking activity and the thing that is doing that activity, the subject of that activity, is a dog. Ok. So it is going from that sort of a surface structure of the language that you are seeing and converting it into a semantic factor representation. In addition, it is then able to draw on the individual meaning and relationships between words so if you saw that the sentence said “The poodle barks.” Then the system knows, if it can draw upon other knowledge about the relationship between words, as Powerset does, that poodles are a kind of dog. So if you as the user were able to say, “I want dogs barking” then it can actually then match the concept of dog to the concept of poodle and it is matching barking to barking and it is then doing this sort of semantic match for you which uses words you are not even using in your query and matching those against the document.[…]
Then Ramez Naam gave some more explanation:
[…]So one of the things that Powerset brings that is unique is the ability to apply their search technology to the query to the user’s search in ways that are beyond just the simple pluralization or adding an “-ing” is that Powerset also looks at the document, it looks at the words that are on a web page and this is actually very important. If you look at just the users query, what you have available to you to figure what they are talking about are three words four words five words, maybe even less. That can give you certain hints. If you look at a web page that has hundreds or thousands of words on it you have a lot more information you can use if you understand it linguistically to tell what its about, what kind of queries it should match and what kind of queries it shouldn’t match[…]
A quick note about given answers by Powerset:
We return answers. We actually synthesize, so if you were to say, “What did Tom Cruise star in,” you actually get not just the movies, but the cover art for the different movies. It synthesizes multiple pieces of information to give you a whole different kind of presentation. Or, if you were just to say, “Bill Gates” you’d be given an automatically generated profile of Bill Gates, pulled across many, many articles. It’s no longer just about 10 links, although we can certainly do more relevant job (and will) of the blue links, and a better job of presenting those links. With the language understanding systems which we now have, we can go way beyond that and open up a whole new door in user experience until you think, “oh god, that’s how I used to search, now I want this whole new different kind of thing.”
It is also interesting to note that Microsoft will use natural language processing powered by Powerset in Word for grammar and not only in the Live search engine.
Now Google doesn’t have to worry yet but I bet that they are also working hard on natural language processing…