2

I have input which has some fields

  • separated by spaces,
  • some other are enclosed in quotes and also seperated by spaces

Here is an example input:

active=1 'oldest active'=0s disabled=0 'function call'=0

I would like to replace :

  • all spaces outside quotes by | and
  • all inside quotes by _

Output would be:

active=1|'oldest_active'=0s|disabled=0|'function_call'=0

I tried different solutions with sed or perl found on the web but did not managed to do want I want.

Sundeep
  • 23,246
  • 2
  • 28
  • 103
BDR
  • 436
  • 2
  • 7
  • 23

6 Answers6

2
$ s="active=1 'oldest active'=0s disabled=0 'function call'=0"
$ echo "$s" | perl -pe "s/'[^']*'(*SKIP)(*F)| /|/g; s/ /_/g"
active=1|'oldest_active'=0s|disabled=0|'function_call'=0

Two step replacement:

  • First, '[^']*'(*SKIP)(*F) will skip all patterns surrounded by ' and replace the remaining spaces with |
  • Second, the spaces now left inside ' will be replaced with _


Alternate solution:

$ echo "$s" | perl -pe "s/'[^']*'/$& =~ s| |_|gr/ge; s/ /|/g"
active=1|'oldest_active'=0s|disabled=0|'function_call'=0
  • Inspired from this answer
  • '[^']*'/$& =~ s| |_|gr/ge replace all spaces in matched pattern '[^']*' using another substitute command. The e modifier allows using command instead of string in replacement section
  • the remaining spaces are then taken care with s/ /|/g


Further reading:

Community
  • 1
  • 1
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • Wow, what a quick and efficient answer ! Thanks for the explanation also. Is there a good documentation of all parameters available (like SKIP, ...) ? – BDR Oct 26 '16 at 15:38
  • see http://www.rexegg.com/backtracking-control-verbs.html and http://www.rexegg.com/regex-best-trick.html – Sundeep Oct 26 '16 at 15:53
1

Using gnu awk FPAT, you can do this:

s="active=1 'oldest active'=0s disabled=0 'function call'=0"

awk -v OFS="|" -v FPAT="'[^']*'[^[:blank:]]*|[^[:blank:]]+" '{
   for (i=1; i<=NF; i++) gsub(/[[:blank:]]/, "_", $i)} 1' <<< "$s"

active=1|'oldest_active'=0s|disabled=0|'function_call'=0
  • In FPAT regex we use alternation to create fields of all single-quoted values+non-space value i.e. '[^']*'[^[:blank:]]* OR non-whitespace values i.e. [^[:blank:]]+ from input.
  • Using gsub we just replace all spaces with _ since we will only get spaces inside single quotes in all the fields.
  • Finally using OFS='|' we delimit output with |

Reference: Effective AWK Programming

Community
  • 1
  • 1
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

This might work for you (GNU sed):

sed -r ":a;s/^([^']*('[^ ']*')*[^']*'[^' ]*) /\1_/;ta;y/ /|/" file

This first replaces all spaces in quoted strings by _'s and then translates the remaining spaces to |'s.

potong
  • 55,640
  • 6
  • 51
  • 83
1

@anubhava's solution calls to mind a old-school perl solution:

$ echo $s | perl -047 -pe "(\$.%2)?s/ /|/g:s/ /_/g;"
active=1|'oldest_active'=0s|disabled=0|'function_call'=0

divide the lines by single quotes (047) and sub based on even/odd.

albe
  • 551
  • 4
  • 15
0

We can use the regular expression inside the loop.

$str = "active=1 'oldest active'=0s disabled=0 'function call'=0";
print "\nBEF: $str\n";
$str =~s#active=1 'oldest active'=0s disabled=0 'function call'=0# my $tmp=$&; $tmp=~s/\'([^\']*)\'/my $tes=$&; $tes=~s{ }{\_}g; ($tes)/ge; $tmp=~s/ /\|/g; ($tmp); #ge;
print "\nAFT: $str\n";

May be some short ways will be there apart from this.

ssr1012
  • 2,573
  • 1
  • 18
  • 30
0
$ awk -F\' '{OFS=FS; for (i=1;i<=NF;i++) gsub(/ /,(i%2?"|":"_"),$i)}1' file
active=1|'oldest_active'=0s|disabled=0|'function_call'=0
Ed Morton
  • 188,023
  • 17
  • 78
  • 185