0

Context: I have a csv file containing a products export from prestashop and I need to remove every occurence of any visual composer shortcode inside of it.

I found this regex "/\[(\/*)?vc_(.*?)\]/" (here) that can help.

Now I'm tring to use sed with that and I built this one line command but it not working at all (option of s unknown).

sed -i -E "s\/\[(\/*)?vc_(.*?)\]//\g" origin.csv

What am i missing?

Edit: The problem is in the Product Description column: e.g.

[vc_row][vc_column width="1/1"][vc_column_text]
ADULT GRAIN FREE
[/vc_column_text][vc_separator color="grey"][vc_column_text]
RICETTA COMPLETA PER CANI ADULTI DI TUTTE LE RAZZE
[/vc_column_text][vc_separator color="grey"][vc_column_text]
65% DI CARNE FRESCA DI POLLO, FRUTTA & VERDURA

That is full of this [vc_row] and similar. A desired output will be like this:

ADULT GRAIN FREE
RICETTA COMPLETA PER CANI ADULTI DI TUTTE LE RAZZE
65% DI CARNE FRESCA DI POLLO, FRUTTA & VERDURA
Atomos
  • 17
  • 6
  • Please add sample input (no descriptions, no images, no links) and your desired output for that sample input to your question (no comment). – Cyrus Apr 26 '21 at 10:21
  • `sed` does not know `*?` (non-greedy) syntax. – Cyrus Apr 26 '21 at 10:23
  • Added an example as requested – Atomos Apr 26 '21 at 10:31
  • You are backslash-escaping the slashes which should not be escaped, too. – tripleee Apr 26 '21 at 10:34
  • Thanks for editing your question, but your samples of input and expected output is not clear yet, could you please do mention is it you want to print next line fter a match found? You said its a csv file but your shown samples doesn't look like csv format, kindly elaborate more on it, thank you. – RavinderSingh13 Apr 26 '21 at 10:34
  • There are some pitfalls with `sed -i`, check against https://stackoverflow.com/questions/43171648/sed-gives-sed-cant-read-no-such-file-or-directory/43453459#43453459 (admittedly one of my answers). – Yunnosch Apr 26 '21 at 10:36
  • Your sample doesn't look like CSV at all (but it doesn't really matter here; but perhaps the question would be more useful to future visitors if it didn't falsely claim to be about CSV). – tripleee Apr 26 '21 at 10:42

1 Answers1

1

sed does not support non-greedy matching. The regex dialect supported by sed is rather primitive, and far predates the Perl features which are now supported in many regex implementations.

The simple fix is to switch to Perl:

perl -pi -e 's%\[/?vc_(.*?)\]%%g' origin.csv

Notice the switch to alternate delimiters to avoid the need to backslash slashes. You were backslash-escaping the slashes which should not be escaped, too!

It's not impossible to do this in sed, either. Just be more specific about what you want. Non-greedy matching is often a lazy (sic) way to avoid saying what you really mean.

sed -i -E "s%\[/?vc_[^][]*\]%%g" origin.csv

The updated regex says there can be anything except square brackets in the match after vc_ which is presumably what you wanted to say all along.

I'm also assuming there can't really be multiple slashes before vc_ and so we simply say /? to indicate one slash max, optional.

Nothing here is specific to CSV; this should work for any text file (though to be really correct, you would need a more complex regex to cover corner cases like a vc_ code with a comma inside it; but let's just assume you don't have any).

tripleee
  • 175,061
  • 34
  • 275
  • 318