2

I am trying to write a regular expression which recognises whitespaces from a user input string, except for between quotation marks ("..."). For example, if the user enters

     #load     "my   folder/my  files/    program.prog"     ;

I want my regex substitution to transform this into

#load "my   folder/my  files/    program.prog" ;

So far I've implemented the following (you can run it here).

#include <iostream> 
#include <string>
#include <regex>

int main(){
  // Variables for user input
  std::string input_line;
  std::string program;

  // User prompt
  std::cout << ">>> ";
  std::getline(std::cin, input_line);

  // Remove leading/trailing whitespaces
  input_line = std::regex_replace(input_line, std::regex("^ +| +$|( ) +"), "$1");

  // Check result
  std::cout << input_line << std::endl;

  return 0;
}

But this removes whitespaces between quotes too. Is there any way I can use regex to ignore spaces between quotes?

Luke Collins
  • 1,433
  • 3
  • 18
  • 36
  • Your question is pretty clear, however the code doesn't seem to have a lot in common. Please break your question down into smaller questions and provide a code that demonstrates them – GalAbra May 06 '18 at 15:28
  • 1
    There's a saying of sorts about regular expressions, it goes something like: "I have a problem. I solved it with regular expressions. Now I have *two* problems". Regular expressions can be immensely powerful, but for many situations it's completely overkill, not to mention they are also immensely complex. Trimming spaces is one situation where other simpler solutions might suffice. In your case, a simply copy-loop with a boolean flag for "inside string", should be more than enough. – Some programmer dude May 06 '18 at 15:29
  • Sounds like you look for trimming... https://stackoverflow.com/questions/216823/whats-the-best-way-to-trim-stdstring – S.Moran May 06 '18 at 15:43
  • @WiktorStribiżew Done – Luke Collins May 10 '18 at 18:23

1 Answers1

1

You may add another alternative to match and capture double quoted string literals and re-insert it into the result with another backreference:

input_line = std::regex_replace(
      input_line, 
      std::regex(R"(^ +| +$|(\"[^\"\\]*(?:\\[\s\S][^\"\\]*)*\")|( ) +)"),
      "$1$2");

See the C++ demo.

The "[^"\\]*(?:\\[\s\S][^"\\]*)*\" part matches a ", then 0+ chars other than \ and ", then 0 or more occurrences of any escaped char (\ and then any char matched with [\s\S]) and then 0+ chars other than \ and ".

Note I used a raw string literal R"(...)" to avoid having to escape regex escape backslashes (R"([\s\S])" = "[\\s\\S]").

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563