Montreal’s Maluuba has created what it is calling the first human-made question-answering dataset for artificial intelligence (AI) research.
Maluuba is a deep-learning company that works on helping machines think, reason and communicate with human-like intelligence.
Recently we reported that Maluuba had teamed up with researchers at McGill University to teach machines how to understand common sense. It looks like this latest development ties into that.
The startup is releasing two “sophisticated natural language understanding datasets.”
The human-created datasets were developed for machine reading comprehension, goal-oriented dialogue systems and conversational interface research. They explore fundamental aspects of human capabilities in literacy and conversation.
“We believe that language understanding is fundamental to solving artificial intelligence,” said Kaheer Suleman, cofounder and CTO of Maluuba. “Our hope is that the Maluuba datasets will push forward the field of AI and natural language, so that collectively we can reach our goal of a world where machines communicate intuitively with humans.”
The first data set is called NewsQA. It was developed to train algorithms capable of answering complex questions that require human-level comprehension and reasoning skills. To do that the company used CNN news articles from the DeepMind Q&A Dataset. As a result, they created a crowd-sourced machine reading corpus of 120,000 question-answer pairs. The questions required reasoning to answer, such as synthesis, inference and handling ambiguity, unlike other datasets that have focused on larger volumes yet simpler questions.
The second dataset, Frames, consists of 19,986 turns that can be used to help train deep-learning algorithms on natural conversations. The text-based conversations were recorded between two humans, simulating conversation between a vacation seeker and a travel agent.
“This is an important new dataset that extends standard dialogue tasks into areas such as comparison and exploration of different customer options,” said Dr. Oliver Lemon, Professor, School of Mathematical and Computer Sciences (MACS), Heriot-Watt University. “Building conversational systems which can support such tasks is a fascinating challenge, and this dataset will help us to do that.”
“Having access to datasets such as Maluuba’s Frames is invaluable in helping AI researchers drive breakthroughs in goal-oriented dialogue,” said Dr. Verena Rieser, Associate Professor, School of Mathematical and Computer Sciences (MACS), Heriot-Watt University. “At the MACS Interaction Lab, this dataset will greatly benefit the academic research we are conducting in spoken dialogue systems and response generation.”
Maluuba’s datasets are available here.