0

Problem: a snippet of code (typically a few lines) in language like Java, C++, etc. (not limiting it to particular language).

I need to extract words from it that seem like unique identifiers - variable names, function/method names, class names, etc.

Obviously that means skipping all the whitespace, newlines, brackets, punctuation, and importantly keywords.

I realize it's somewhat similar to this question: Sanitize/Rewrite HTML on the Client Side

I guess some modification of that code using regexes could get me smth approximately good enough. But I wonder if there's a better (cleaner, shorter) way?

Community
  • 1
  • 1
LetMeSOThat4U
  • 6,470
  • 10
  • 53
  • 93
  • So, you need a parser that recognizes syntax for multiple languages? And you want to do it in a few lines of regular expressions? That's like trying to build an airplane using nothing but a blowtorch and a pair of paperclips – blgt Jul 15 '14 at 07:18
  • Emphatically, I do not need a parser. I need to extract identifiers that are not keywords. – LetMeSOThat4U Jul 15 '14 at 08:46

0 Answers0