1

I'm using regular expressions to find certain patterns in natural language processing.

I find myself using the same patterns over and over. and since these patterns can be hard to read in a terse regular expression, I'm asking myself if I should develop a higher order regular language that captures that?

will I be digging myself into a hole trying to define a DSL like that? what's a good framework for developing such a language, and what can I expect in terms of effort of building it? what are some common pitfalls for defining and building such a language?

it could look something like this [views] overlooking [the] ($object)

that would capture text such as "overlooking the ocean" or "overlooking cityscapes".

or another example could be ($granite) counter[- ]tops that would capture text such as "granite countertops" or "quartz counter-tops" (but not "granite counter" or just "countertops"

Aviad Rozenhek
  • 2,259
  • 3
  • 21
  • 42
  • You could take a look at using EBNF, this is a way of describing a language. Here are some links to point you in (hopefully) the right direction: [How to describe the grammer of a language](https://tomassetti.me/ebnf/) / [How to implement a mini-language](https://stackoverflow.com/questions/36245685/how-to-implement-a-mini-language) / [Implementing a language in c#](https://www.christianwilson.me/implementing-language-csharp-1/) – André Kool Feb 20 '18 at 10:27
  • I think you need a way to parser a grammar. It seems to me parser combinators are a good fit, but it might be difficult to get started. Let me know if you want me to give an example using Python. – amirouche Apr 04 '18 at 20:26

0 Answers0