A Pragmatic Approach to Structured Data Querying via Natural Language Interface

As the use of technology increases and data analysis becomes integral in many businesses, the ability to quickly access and interpret data has become more important than ever.

Why build natural language interface for structured data querying?

Today’s information retrieval technologies utilized by companies claim to democratize data but the reality is that these technologies are very complex and require understanding of query languages, such as SQL, strong analytical skills, extensive training, and knowledge of data structure to formulate a valid query. Business people can barely use these systems without the help of a skilled business analyst.

Companies need to employ business analytics teams to help nondata professionals interact with enterprise data. These teams typically have an ever growing reporting backlog, as a result, even a simple question may take days to answer.

To reduce some burden on already overstretched data teams, many organizations are looking for self-service tools that allow non-developers to query databases using natural language without needing a data analyst for every report.

FriendlyData’s approach to structured data querying via natural language interface

At FriendlyData we are building a natural language interface for database querying. Our product translates natural language questions into corresponding SQL queries making data easily accessible to everyone in a company.

FriendlyData’s query translation algorithm is more accurate than all known NLP query parsers. When applied to the WikiSQL question corpus it demonstrated the high precision of 74.5%, outperforming the leading neural-net-based models.

FriendlyData uses a grammar-based parser and rule-based text-to-SQL translation. Besides high accuracy, our approach provides additional benefits over machine learning methods:

  • works more robustly with complex data operations such as nested grouping, time series calculations, functions with multiple parameters, etc.
  • enables generic search (can search data without knowing the exact column and with no access to a database)
  • doesn't require a massive training corpus
  • enables an extension of translation rules and synonyms base
  • supports more data types (date and boolean)
  • makes it easier to integrate new functions and data types with no significant changes in the core system

In this post, we want to share our research paper "A pragmatic approach to structured data querying via natural language interface", where we describe our algorithm in detail and discuss a number of factors that can dramatically affect the system architecture and the set of algorithms used to translate NL queries into a structured query representation.

Our primary goal is to help companies find the best solution when both high quality query translation and high security standards of architecture are required. Our method is designed for real-life business cases, where such factors as data security, time, scalability, and accuracy are mission critical. By no means will our approach be the best in every case, but our goal is to show what factors really matter for enterprises in real-life scenarios.

Find the whole paper on arxiv.org, and also be sure to follow us on Twitter, where we are sharing all the latest thinking in Natural Language Processing along with company news, papers, and other useful resources.

Democratize your data

FriendlyData helps to respond to one of the key challenges in the world of enterprise data - building the power of data and analytics into day-to-day decision-making.

The solution we offer is data democratization. FriendlyData makes data accessible to everyone by providing a user-friendly natural language search interface for databases.

Foster data-driven culture in your organization with us!