NAACL 2018 Talks (Summary)

A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding (from the Alexa team)

The talk basically explained how Alexa detects intents and performs slot-filling. A lot of stuff they talked about are directly related to FrameIt. There were two main points in the talk:

Similar to FrameIt, they first do intent classification and then slot-filling. But like us, they want to see if slots can help you with intent classification. To achieve this, they first shortlist top “K” intents according to the ML model. Then they carry-out slot-filling for each of these intents, and then pass these “K” intents along with the extracted slots to another classifier to see which one makes more sense. The classifier can now use the signal from extracted slots to judge better.
They mentioned that intent-detection for them works in a hierarchical way. For instance, if you say “Play the thriller for me”, they don’t directly map it into an intent. They first decide that it’s not a question but a command. Then they decide it wants us to play back something. Then then decide if that something should be a thriller movie or the song thriller or other stuff. They mentioned that this hierarchical exploration is essential otherwise there are so many intents to build a classifier for.

Self-Training for Jointly Learning to Ask and Answer Questions (from CMU)

This idea here was very simple and neat. They mentioned that people are working on QA and question generation as two separate problems, but of course these problems are tightly connected. They presented how you can create new questions using existing question generation techniques and use that as data to improve your QA system. And how you can use a QA system train a better question generation system. The technical aspect of the talk was focused on making sure that this cycle is stable and avoid drifts into meaningless questions and answers.

The Web as a Knowledge-base for Answering Complex Questions (from Tel Aviv University)

Their goal was to answer complex question using the web. “What is the population of the richest country in europe?”. They mentioned that at the moment, google does not handle this questions, but by decomposing these questions into logical forms and querying each smaller chunk to google you can answer these questions. Using an off the shelf semantic parser, then realize that they need to first find the “richest country in europe” (which google can easily accomplish) and then answer the “What is the population of Germany” which again google is good at answering.

Semantic Structural Evaluation for Text Simplification (from The Hebrew University of Jerusalem)

This was a 12 minute talk so there are so many unclear parts. They proposed a new method for text simplification. I found this interesting as we’ve also discussed about simplifying happy moments or similar data in the lab. Their proposed method uses a semantic parser, namely UCCA which is focused on extracting what they call a “Scene”. For example, “walking” would be a scene in “I want for a walk”. They use the notion of “Scene” to measure how much your simplified text conveys the same “Scenes”. I’m skeptical how much they can understand that “We had a great chat we coworkers over lunch” and “I chatted with coworkers” are similar. It all comes down to how they define scenes.

Specialising Word Vectors for Lexical Entailment (from University of Cambridge)

I’m not going to dive into details but they’ve proposed a new technique for predicting word vectors that also accounts for Lexical Entailment (i.e., to know that “bird” IsA “animal”). Their proposed word embeddings captures the similarity of the various words using the cosine similarity of the vectors which means they tune the directions of each vector in the space. For Lexical Entailment, they use the length of each vector and they make sure that “animal” end up with a larger length compared to “bird”. Combining both the directions of the vectors and their length they can exploit both word similarity as well as entailment.

Colorless Green Recurrent Networks Dream Hierarchically (from Facebook AI)

They studied a simple questions: “Do RNNs learn syntax?”. I found this interesting since we’ve talked about including POS tags many times as additional features. Well, they concluded that RNNs are good at learning the syntax which makes me believe that there is really no point in providing syntactic features to these models (at least in the presence of enough training data). If you’re wondering about the title, I need to add that they wanted to make sure that the RNN is learning the syntax and not inferring anything from the semantics of the word thus they put meaningless yet grammatically correct sentences such as the well-known sentence “Colorless green ideas sleep furiously”.

Sentiment Analysis: It’s Complicated! (from McGill University)
This was a really cool study which was more about crowdsourcing than sentiment analysis. They collected sentiment labels for tweets using the usual positive, negative, and neutral classes. Obviously, the workers did not always agree. They realize that if they stick with data points with small degree of agreement then they need to throw away 10% of their data, and if they enforce a stronger agreement (4 out of 5 votes), then they need to throw out 35% of their data. Nevertheless, they trained a sentiment classifier using these 2 datasets after removing the disagreements. Then, they decided to re-label the data and then add a new category called “complicated”. Using this new data, they had 2 findings:

The classifiers using this 4 categories performs much better on test data compared to the classifiers that uses the 3 classes (and is not fed with the disagreements).
They noticed that marking disagreements as “complicated” actually doesn’t work. Basically, they do not align with what people mark as complicated necessarily.

Based on this, they suggested that in most crowdsourcing tasks we should avoid dropping the problematic data and since keeping it under the label “problematic” can help improve the system.

Deep Contextualized Word Representations (from AI2) —- BEST PAPER

This paper is also known as the “ELMo” paper and received the best paper award. This is indeed a very influential paper as they achieve a new state-of-the-art on multiple NLP tasks. The talk was quite technical and I need to consult the paper to make sure that I fully understand the details. At a high-level, the paper proposes a new word vector model which does not provide a fix vector for each word. Basically, the embedding of a word is dynamic and gets updated depending on the context that it appears in. As a result, “stork” might have a different embedding depending on whether you are talking about wildlife vs. (delivering) babies! They’ve shown that using these embeddings in some of the competitive models yields state-of-the-art results. As I mentioned, the embedding of each word depends on the context it appears in, and to compute (or rather update) the embeddings they use LSTMs to summarize the context to the right and to the left of that word in that particular sentence.

There was much more going on at NAACL, but I think this is a good point to stop 🙂