0

In Unix, suppose a file contains 5 fields & data such as:

"112233"|"Roshan"|"25"|" FAX 022 3987789 \| TEL 77766288892 \| abc "|"Male"

need to extract 4th field. using below

column_value=`echo $line | cut -f4 -d'|'

This only gives us - " FAX 022 3987789 \

but need " FAX 022 3987789 \| TEL 77766288892 \| abc " as 4th column value.

Effective delimiter should be -

"|"

pepoluan
  • 6,132
  • 4
  • 46
  • 76
sailesh
  • 93
  • 1
  • 1
  • 5

2 Answers2

1

cut is not the right tool for the job when it involves a multi-character de-limiter needed for parsing input string/file.

You can use GNU Awk with FPAT which defines how each field in a record should look like. You can write FPAT as a regular expression constant in which case something like below should work.

FPAT = "(\"[^\"]+\")"

Using this in the Awk command,

line='"112233"|"Roshan"|"25"|" FAX 022 3987789 \| TEL 77766288892 \| abc "|"Male"'
awk '
BEGIN {
    FPAT = "(\"[^\"]+\")"
}{print $4}' <<<"$line"

produces an output as

" FAX 022 3987789 \| TEL 77766288892 \| abc "

Regular Expression - Test results

Inian
  • 80,270
  • 14
  • 142
  • 161
  • 2
    See also https://stackoverflow.com/questions/7804673/escaping-separator-within-double-quotes-in-awk – tripleee Aug 30 '17 at 07:20
0

you can add the two extra fields as follows

echo $line | cut -f 4,5,6 -d\|

alternatively you could use sed to replace the "|" delimiter with a different char (for example a tab)

echo $line | sed s/\"\|\"/\t/g | cut -f 4 
Ian Kenney
  • 6,376
  • 1
  • 25
  • 44