2

Background

I'm looking to replace replace all lines from a file that start with -- or with [[:space:]]{1,}. Broadly speaking, I'm looking to achieve the results that would be similar to this answer.

Code

/*
 * so_question.cpp
 * Read text files and remove all lines starting with -- or <space>*--
 * Clean text is passed to cout.
 * 
 * Compile and test:
 * clang++ -lboost_regex -Wall -std=c++11 so_question.cpp -o so_question && ./so_question tst.sql
 */


#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <boost/regex.hpp>
#include <boost/algorithm/string/replace.hpp>

int main(int argc, char *argv[]) {

    // Read file to stringstream
    std::ifstream file( argv[1] );

    if ( file )
    {
        std::stringstream buffer;

        buffer << file.rdbuf();

        file.close();

        // Create a string variable to apply boost::regex
        std::string readText;
        readText = buffer.str();

        // Regular expression finding comments
        boost::regex re_comment("^([[:space:]]{1,}|)--.*(\n|\r|)$");

        // Replace desired lines
        // boost::replace_all(readText, re_comment, " ");

        // Replace via regex replace
        std::string result = boost::regex_replace(readText, re_comment, " ");

       // Show clean text when using regex_replace
       std::cout << "\nClean text:\n" << result << std::endl;

        // Show clean text
        // std::cout << "Clean text:" << readText << std::endl;


        return 0;
    }

}

Test data

-- Query taken from:
-- https://stackoverflow.com/a/12467388/1655567
SELECT country.name as country, country.headofstate
from country
    -- Worst place to add comment ever
  -- Here is even uglier
inner join city on city.id = country.capital
where city.population > 100000
-- this comment makes no sense here and would break sql parser buyt hey
and country.headofstate like 'A%'
-- WOW!

Desired results

SELECT country.name as country, country.headofstate
from country
inner join city on city.id = country.capital
where city.population > 100000
and country.headofstate like 'A%'

Compilation and test

clang++ -lboost_regex -Wall -std=c++11 so_question.cpp -o so_question && ./so_question tst.sql

Problem

The returned text is exactly the same as in the provided file. I reckon that the problem is with the particular regex syntax I'm using. However, after reading boost regex documentation and testing multiple versions of that regex it's not clear to me what the right syntax should be.

  • ALE points out that #includes are not sorted properly. Considering llvm guidelines what would be the right order (this is side point, not related to main question)?
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • 1
    Could you check if you have misplaced `|` alternation symbol ? My first guess is you did and it should had been outside capturing group i.e after `)` – Rahul Jan 13 '18 at 08:08
  • @Rahul Thanks for your comment. I'll have a look, in effect I was looking for something on the lines of `(\n|\r|\z)` so all line breaks and string end. I reckon that `\z` does not work in boost but I'll look into **`|`**. – Konrad Jan 13 '18 at 08:11
  • 1
    To check beginning with `--` you don't have to check rest of the line with `.*(\n|\r|)$`. Only `^--` would be suffice. – Rahul Jan 13 '18 at 08:14
  • @Rahul Fair point, I also want to capture any spaces before `--`, would `[[:space:]]*--` work? – Konrad Jan 13 '18 at 08:19
  • Since you are checking for more than one spaces, ` --` will be covered in same condition. I have devised [this regex](https://regex101.com/r/m0rIjK/1/) for your understanding. – Rahul Jan 13 '18 at 08:20
  • Do you need to keep the lines with `--` in? Because you could read the file one line at a time and just ignore any lines that match for `^\s*--` (or even just use normal string operations to detect them)? – Galik Jan 13 '18 at 08:30
  • 1
    You're compiling c++11. Why not use `std::regex`? – rustyx Jan 13 '18 at 09:29

2 Answers2

1

Rearrange your parentheses:

(?m)^(?:--|[[:space:]]+).+

And see a demo on regex101.com. Note that [[:space:]]{1,} is the same as [[:space:]]+.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • I've tried the syntax: `boost::regex re_comment("(?m)^(?:--|[[:space:]]+).+");` and the getting the results: ` std::string result = boost::regex_replace(readText, re_comment, " ");` but I keep on getting empty line when running: `std::cout << "\nClean text:\n" << result << std::endl;` – Konrad Jan 13 '18 at 17:08
0

The form of regex_replace that you call returns the string with replacements, it does not operate in-place.

SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
  • I've changed `boost::regex_replace` to: `std::string result = boost::regex_replace(readText, re_comment, " ");` – Konrad Jan 13 '18 at 17:09