Split and Process text file in sh

Question

I have a text file with comma (,) separator :

60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_6,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=4_60,8401
60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_5,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=1_59,8401

I would like to get the output :

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

So for each line I flattened field " tel:+33xxxxxxx;kn-corp-groups=3_6 " in " 3,6" for example.

Would you have any idea on how I could do this? Thanks

Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the [help] and read [ask], and especially read [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) — kvantour, Mar 14 '19 at 14:56

James Brown · Answer 1 · 2019-03-14T13:49:44.747

3

For this data:

$ awk 'BEGIN{FS="[,_=]";OFS=","}{print $1,$2,$3,$4,$5,$7,$8,$9}' file

Output:

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

Explained:

$ awk 'BEGIN{
    FS="[,_=]"                    # use multiple chars as field separators
    OFS=","
}
{
    print $1,$2,$3,$4,$5,$7,$8,$9
}' file

edited Mar 14 '19 at 13:49

answered Mar 14 '19 at 13:45

James Brown

36,089
7
43
59

1

Your missing a field in the output. Instead `[,=_]` for the delimiter and `$1,$2,$3,$4,$5,$7,$8,$9` for the list of fields. – JNevill Mar 14 '19 at 13:47

RavinderSingh13 · Answer 2 · 2019-03-14T14:07:30.993

Could you please try following, if I got it right you need to fetch lines which have string tel:+33xxxxxxx in it.

awk -F'[,_=]' 'BEGIN{OFS=","} /tel:\+33xxxxxxx/{print $1,$2,$3,$4,$5,$7,$8,$9}'  Input_file

2nd solution: In case you don't want to hard-code(these values could be anywhere in Input_file) the field numbers then try following.

awk '
BEGIN{
  OFS=","
}
match($0,/^[0-9]+\,tel:\+33xxxxxxx\,[0-9]+\,[0-9]+\,[0-9]+/){
  val=substr($0,RSTART,RLENGTH)
  match($0,/kn-corp-groups=[0-9]+_[0-9]+\,[0-9]+/)
  val1=substr($0,RSTART+15,RLENGTH-15)
  sub("_",",",val1)
  print val,val1
  val=val1=""
}'   Input_file

Output will be as follows.

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

@Shakile, Could you please check and let me know if this solution is helpful for you? — RavinderSingh13, Mar 15 '19 at 11:41

score 0 · Answer 3 · answered Mar 14 '19 at 14:22

0

use gawk:

awk 'BEGIN{ FS=OFS="," } NF {$(NF-1) = gensub(/.*=(.*)_/, "\\1,", 1, $(NF-1))}1' file

Here we just need to process the next to the last column $(NF-1) with gensub() and NF as a condition to skip EMPTY lines.

answered Mar 14 '19 at 14:22

jxc

13,553
4
16
34

score 0 · Answer 4 · answered Mar 14 '19 at 14:31

0

$ sed 's/[^,]*;[^,]*\([0-9]*\)_/\1,/' file
60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

answered Mar 14 '19 at 14:31

Ed Morton

188,023
17
78
185

Elias Regopoulos · Accepted Answer · 2019-03-15T08:59:51.303

sed

awk has already been covered by other answers. Here is an alternative using sed:

$ sed -E -e 's/[^,]+;[^=]+=//' -e 's/_/,/' file

Explanation

sed -E in order to use Extended regular expressions.
sed -e executes a sed script. Remember to enclose the sed scripts in single-quotes ('), to stop the shell from expanding it. We will need to execute two scripts.
s/[^,]+;[^=]+=// The first of the two scripts. Strips away the string we don't want (tel:+33xxxxxxx;kn-corp-groups=):
- Substitute (s/)
- one or more characters that are not the comma ([^,]+)
- followed by a single semicolon (;)
- followed by one or more characters that are not the equals sign ([^=]+)
- followed by a single equals sign (=)
- with nothing, i.e. delete the matched string (//).
s/_/,/ The second of the two scripts. Replaces the underscore (_) between the two numbers with a comma (,):
- Subsitute (s/)
- a single underscore (_)
- with a comma (/,/).

Alternatives

Some more shell alternatives without awk:

sed piping
The two sed scripts could also have been used with a pipe:
$ sed -E 's/[^,]+;[^=]+=//' file | sed 's/_/,/'.
This would be less efficient, but if speed is no concern, some people may find it easier to understand. See this answer for details.
sed + tr
The second part of the pipe above can be exchanged with a simple tr command:
$ sed -E 's/[^,]+;[^=]+=//' file | tr '_' ','.
tr + cut
We can also do without sed:
$ tr '=_' ',' < file | cut -d, -f 1-5,7-9
Here, we first replace the = and the _ with , using tr, in order to have our fields separated by commas,
and print all the fields except the 6th one with cut (-d denotes the delimiter which is ,, and -f denotes the fields we want to print, i.e. all except the 6th).
sed group captioning
See also Ed Morton's answer which uses sed's group captioning.

wrt `awk has already been covered by other answers` - so has sed, see https://stackoverflow.com/a/55165137/1745001. — Ed Morton, Mar 14 '19 at 17:06
@EdMorton you're right, I didn't see it when I initially started writing my answer. I've added a reference to it. — Elias Regopoulos, Mar 15 '19 at 09:01

score 0 · Answer 6 · answered Mar 14 '19 at 15:46

Using Perl regex

perl -pe ' s/(.*)(tel:.*=)(.*)_(.*)/$1$3,$4/ ' file

with your given inputs

$ cat shakile.txt
60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_6,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=4_60,8401
60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_5,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=1_59,8401

$ perl -pe ' s/(.*)(tel:.*=)(.*)_(.*)/$1$3,$4/ ' shakile.txt
60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

$

score 0 · Answer 7 · answered Mar 15 '19 at 09:05

0

awk '{sub(/_/,",")}{print (substr($0, 1,29) substr($0, 60))}' file

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

answered Mar 15 '19 at 09:05

Claes Wikner

1,457
1
9
8

Split and Process text file in sh

7 Answers7

sed

Explanation

Alternatives