Does it make sense to interrogate structured data using NLP?

Question

I know that this question may not be suitable for SO, but please let this question be here for a while. Last time my question was moved to cross-validated, it froze; no more views or feedback.

I came across a question that does not make much sense for me. How IFC models can be interrogated via NLP? Consider IFC models as semantically rich structured data. IFC defines an EXPRESS based entity-relationship model consisting of entities organized into an object-based inheritance hierarchy. Examples of entities include building elements, geometry, and basic constructs.

How could NLP be used for such type of data? I don't see NLP relevant at all.

I'm actually interested in answers to this, so I upvote and comment. Good luck. — Filip Malczak, Nov 04 '16 at 21:34
Conversion from machine readable to human readable data is relevant to nlp. For example, this paper: http://gup.ub.gu.se/records/fulltext/202121/202121.pdf — Mehdi, Nov 04 '16 at 22:52
I don't think they want to convert it to human readable data. — Thoran, Nov 05 '16 at 09:48

Special Sauce · Accepted Answer · 2016-11-26T08:45:57.317

In general, I would suggest that using NLP techniques to "interrogate" already (quite formally) structured data like EXPRESS would be overkill at best and a time / maintenance sinkhole at worst. In general, the strengths of NLP (human language ambiguity resolution, coreference resolution, text summarization, textual entailment, etc.) are wholly unnecessary when you already have such an unambiguous encoding as this. If anything, you could imagine translating this schema directly into a Prolog application for direct logic queries, etc. (which is quite a different direction than NLP).

I did some searches to try to find the references you may have been referring to. The only item I found was Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques:

... the authors propose a new method for extending the IFC schema to incorporate CC-related information, in an objective and semiautomated manner. The method utilizes semantic natural language processing techniques and machine learning techniques to extract concepts from documents that are related to CC [compliance checking] (e.g., building codes) and match the extracted concepts to concepts in the IFC class hierarchy.

So in this example, at least, the authors are not "interrogating" the IFC schema with NLP, but rather using it to augment existing schemas with additional information extracted from human-readable text. This makes much more sense. If you want to post the actual URL or reference that contains the "NLP interrogation" phrase, I should be able to comment more specifically.

Edit:

The project grant abstract you referenced does not contain much in the way of details, but they have this sentence:

... The information embedded in the parametric 3D model is intended for facility or workplace management using appropriate software. However, this information also has the potential, when combined with IoT sensors and cognitive computing, to be utilised by healthcare professionals in Ambient Assisted Living (AAL) environments. This project will examine how as-constructed BIM models of healthcare facilities can be interrogated via natural language processing to support AAL. ...

I can only speculate on the following reason for possibly using an NLP framework for this purpose:

While BIM models include Industry Foundation Classes (IFCs) and aecXML, there are many dozens of other formats, many of them proprietary. Some are CAD-integrated and others are standalone. Rather than pay for many proprietary licenses (some of these enterprise products are quite expensive), and/or spend the time to develop proper structured query behavior for the various diverse file format specifications (which may not be publicly available in proprietary cases), the authors have chosen a more automated, general solution to extract the content they are looking for (which I assume must be textual or textual tags in nearly all cases). This would almost be akin to a search engine "scraping" websites and looking for key words or phrases and synonyms to them, etc. The upside is they don't have to explicitly code against all the different possible BIM file formats to get good coverage, nor pay out large sums of money. The downside is they open up new issues and considerations that come with NLP, including training, validation, supervision, etc. And NLP will never have the same level of accuracy you could obtain from a true structured query against a known schema.

https://www.google.se/url?sa=t&source=web&rct=j&url=http://www.dit.ie/media/documents/study/postgraduateresearch/selffundedprojects/Project%2520Ad%2520IBM%2520BIM%2520UPDATED.docx&ved=0ahUKEwiJkaS89sXQAhVGMJoKHQzeCKAQFggiMAE&usg=AFQjCNEtGY7_y7VPXGpJ5LZbiRZI_2XRlA&sig2=FWd9rewbWmCMoIiyrMaVkQ — Thoran, Nov 26 '16 at 07:45

Does it make sense to interrogate structured data using NLP?

1 Answers1