I Know What You Want to Express:

Sentence Element Inference by Incorporating External Knowledge Base

Abstract

Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject,verb,object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model.

Data Description

Here we introduce the Sentence auto-completion data, which includes:

Freebase Subgraph Dataset.

This data contains 5,170,340 entities and 7,152 relations. The data is in the form of triples (head, relation, tail), and there are totally 140,785,671 triples in the data.

Wikipedia Sentence Dataset.

This data contains 5,793 English Wikipedia sentences. For each sentence, the object and the subject are matched to KB entities, and relation paths have been calculated. The data is in the JSON form.

Question Dataset.

This data contains 254 questions generated from the sentence corpus. For each question, the topic entity and the answer entity are identified and exactly matched to Freebase entities.

The data can be downloaded at:

Code Download

This code in implemented with pylearn2, a theano-based deep learning library. Please install theano and pylearn2 refering official guides at first. More experimental results and optimal parameters are representaled in the paper.

Click here to download.