Along those lines, I recently had an academic discussion this week around syntax and semantics in machine-to-machine interfaces (APIs), and how that correlates to user/consumer intent -- specifically whether or not you can infer user intent from captured REST calls. (FWIW -- I believe you can infer intent (at least some notion of it), although it might only be a portion of the user's goal, and may be unintentionally misaligned with the semantics of the call.)
With that conversation fresh in my mind, I found it amusing that the developer API for Echo specifically calls out "intent" and defines it as:
"In the context of Alexa apps, an intent represents a high-level action that fulfills a user’s spoken request. Intents can optionally have arguments called slots. Note that intents for Alexa apps are not related in any way to Android intents." - Echo Developer : Getting Started Guide
That made me nostalgic. I loved my days in NLP, and it's absolutely phenomenal to see how things have played out over the years...
(NOTE: what follows is almost entirely self-serving, and you may not get anything out of it. I am not responsible for any of the time you lose in reading it ;)
I think I was eighteen when I started working at the Natural Language Processing (NLP) group within Unisys. I was one of the many developers building those terrible voice recognition systems on the other end of the phone when you dialed in for customer service and received anything but that. We frustrated our end-users, but -- I got to work with some amazing people: Debbie Dahl, Bill Scholz, and Jim Irwin.
And we thought we were smart. We built all sorts of tools that helped map text into actions ("intents"), and new fangled web servers for voice recognition. They were good times, but frankly we were only inflicting pain on people. Voice recognition wasn't there yet. We spent our time trying to engineer the questions properly, to guide users to answer with certain terms to make it easy on the voice recognition system. (The year was (1995-1999ish)
For a time after that, I wanted nothing to do with voice recognition. I still loved NLP though, and went on to build a system to automate email routing and responses for customer service. That worked because it was numbers game. If the system was confident enough to answer 50% of the customer inquires, it meant they didn't need humans to respond to that subset, which saved moolah. We went on to extend that into a real-time instant messaging over the web (Kana IQ). Again, good times and lots of brain work/patents in the process, and we congratulated ourselves for figuring out how to map a Bayesian inference engine on to a grammar. (1999-2001ish) But there is no way we could have put that system directly in the hands of consumers with no human support.
However -- it was during this time that I started appreciating the difference between: Syntax, Semantics, Intents and Goals. Here are the straight-up definitions:
Syntax : the arrangement of words and phrases to create well-formed sentences in a language.
Semantics : the branch of linguistics and logic concerned with meaning.
Intent : the reason for which something is done or created or for which something exists.
Goal : the object of a person's ambition or effort; an aim or desired result.
Consider the process of taking characters in a string as input and converting it into actions that help the user achieve their goal. First, you need to consider syntax. In this part of the process, you are converting useless string of characters into related tokens. There are lots of ways to do this, and multiple pieces to the puzzle (e.g. Part of Speech Tagging), etc. Back at Brown, I had the pleasure of studying under Eugene Charniak, and his book on Statistical Natural Language Learning became a favorite of mine (after initially hating it for a semester ;). Ever since that course, I've fallen back on Context-Free-Grammars (CFGs) and chart-parsing to attempt to relate word tokens to each other in a sentence.
Once you have related tokens, and you know the parts of speech (ADJ, NOUN, etc), and how they relate (ADJ _modifies_
After Kana, I took a job with a company that recorded patient/doctor conversations, transcribed them, and then attempted to perform computational linguistics on them. If you can imagine, trying to discern between a mention of a *symptom* and a *side-effect* is incredibly difficult. You not only need solid parsing to associate the terms in the sentence, but you need a knowledge base to know what those terms mean, and the context in which they are being used. In this phase, we are assigning meaning to the terms. (i.e. semantics) Sure, we used Hadoop for the NLP processing, but more horse power didn't translate into better results... (now 2008-2010ish)
Even assuming you can properly assign semantics, you may misinterpret the intent of a communique. Even with human-to-human communication, this happens all the time. English is a horrible language. And the same thing can happen with machines. However, you have to assume that more often than not, assuming you can parse the sentence (i.e. assign syntax), and you can interpret meaning (i.e. semantics), you can infer some notion of intent from the textual gibberish. Otherwise, communication is just fundamentally broken.
With all this in mind -- back to Echo.
Amazon nailed it. I see the 20 years of my NLP frustration, solved in a 12" cylinder. It rarely misses on the voice recognition. And they have a fantastic engine taking phonemes to "intent", allowing developers to plugin at that highest layer. Boo yah. That's empowering. I might just build an app for Echo. Maybe an AWS integration -- "Alexa, please expand my AWS cluster by 10 nodes". It'd be therapeutic. ;)