-1

I have a text file with comma (,) separator :

60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_6,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=4_60,8401
60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_5,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=1_59,8401

I would like to get the output :

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

So for each line I flattened field " tel:+33xxxxxxx;kn-corp-groups=3_6 " in " 3,6" for example.

Would you have any idea on how I could do this? Thanks

Shakile
  • 343
  • 2
  • 5
  • 13
  • 1
    Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the [help] and read [ask], and especially read [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) – kvantour Mar 14 '19 at 14:56

7 Answers7

3

For this data:

$ awk 'BEGIN{FS="[,_=]";OFS=","}{print $1,$2,$3,$4,$5,$7,$8,$9}' file

Output:

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

Explained:

$ awk 'BEGIN{
    FS="[,_=]"                    # use multiple chars as field separators
    OFS=","
}
{
    print $1,$2,$3,$4,$5,$7,$8,$9
}' file
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • 1
    Your missing a field in the output. Instead `[,=_]` for the delimiter and `$1,$2,$3,$4,$5,$7,$8,$9` for the list of fields. – JNevill Mar 14 '19 at 13:47
0

Could you please try following, if I got it right you need to fetch lines which have string tel:+33xxxxxxx in it.

awk -F'[,_=]' 'BEGIN{OFS=","} /tel:\+33xxxxxxx/{print $1,$2,$3,$4,$5,$7,$8,$9}'  Input_file


2nd solution: In case you don't want to hard-code(these values could be anywhere in Input_file) the field numbers then try following.

awk '
BEGIN{
  OFS=","
}
match($0,/^[0-9]+\,tel:\+33xxxxxxx\,[0-9]+\,[0-9]+\,[0-9]+/){
  val=substr($0,RSTART,RLENGTH)
  match($0,/kn-corp-groups=[0-9]+_[0-9]+\,[0-9]+/)
  val1=substr($0,RSTART+15,RLENGTH-15)
  sub("_",",",val1)
  print val,val1
  val=val1=""
}'   Input_file

Output will be as follows.

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

use gawk:

awk 'BEGIN{ FS=OFS="," } NF {$(NF-1) = gensub(/.*=(.*)_/, "\\1,", 1, $(NF-1))}1' file

Here we just need to process the next to the last column $(NF-1) with gensub() and NF as a condition to skip EMPTY lines.

jxc
  • 13,553
  • 4
  • 16
  • 34
0
$ sed 's/[^,]*;[^,]*\([0-9]*\)_/\1,/' file
60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

sed

awk has already been covered by other answers. Here is an alternative using sed:

$ sed -E -e 's/[^,]+;[^=]+=//' -e 's/_/,/' file

Explanation

  • sed -E in order to use Extended regular expressions.
  • sed -e executes a sed script. Remember to enclose the sed scripts in single-quotes ('), to stop the shell from expanding it. We will need to execute two scripts.

  • s/[^,]+;[^=]+=// The first of the two scripts. Strips away the string we don't want (tel:+33xxxxxxx;kn-corp-groups=):

    • Substitute (s/)
    • one or more characters that are not the comma ([^,]+)
    • followed by a single semicolon (;)
    • followed by one or more characters that are not the equals sign ([^=]+)
    • followed by a single equals sign (=)
    • with nothing, i.e. delete the matched string (//).
  • s/_/,/ The second of the two scripts. Replaces the underscore (_) between the two numbers with a comma (,):
    • Subsitute (s/)
    • a single underscore (_)
    • with a comma (/,/).

Alternatives

Some more shell alternatives without awk:

  • sed piping
    The two sed scripts could also have been used with a pipe:
    $ sed -E 's/[^,]+;[^=]+=//' file | sed 's/_/,/'.
    This would be less efficient, but if speed is no concern, some people may find it easier to understand. See this answer for details.
  • sed + tr
    The second part of the pipe above can be exchanged with a simple tr command:
    $ sed -E 's/[^,]+;[^=]+=//' file | tr '_' ','.
  • tr + cut
    We can also do without sed:
    $ tr '=_' ',' < file | cut -d, -f 1-5,7-9
    Here, we first replace the = and the _ with , using tr, in order to have our fields separated by commas,
    and print all the fields except the 6th one with cut (-d denotes the delimiter which is ,, and -f denotes the fields we want to print, i.e. all except the 6th).
  • sed group captioning
    See also Ed Morton's answer which uses sed's group captioning.
0

Using Perl regex

perl -pe ' s/(.*)(tel:.*=)(.*)_(.*)/$1$3,$4/ ' file

with your given inputs

$ cat shakile.txt
60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_6,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=4_60,8401
60,tel:+33xxxxxxx,840191,1,0,tel:+33xxxxxxx;kn-corp-groups=3_5,8401
61,tel:+33xxxxxxx,840191,1,1,tel:+33xxxxxxx;kn-corp-groups=1_59,8401

$ perl -pe ' s/(.*)(tel:.*=)(.*)_(.*)/$1$3,$4/ ' shakile.txt
60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401

$
stack0114106
  • 8,534
  • 3
  • 13
  • 38
0
awk '{sub(/_/,",")}{print (substr($0, 1,29) substr($0, 60))}' file

60,tel:+33xxxxxxx,840191,1,0,3,6,8401
61,tel:+33xxxxxxx,840191,1,1,4,60,8401
60,tel:+33xxxxxxx,840191,1,0,3,5,8401
61,tel:+33xxxxxxx,840191,1,1,1,59,8401
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8