0

Need to read line by line a file example below and then get only and put it in separate file file 1

1~abc~select col1,col2,col3,col4 from <tablename> where <condition> group by~xyz
2~abc~select col1,col2,col3      from <tablename> where <condition> group by~xyq
3~abc~select col1,col2           from <tablename> where <condition> group by~xyg

new file expecting

1~abc~<condition>~xyz
2~abc~<condition>~xyq
3~abc~<condition>~xyg

is there any simpler way to get this done using shell script.

3 Answers3

0

This can be achieved using sed utility in unix. A single expression is sufficient actually sed -e 's/select .* where //;s/ group by.*~/~/'

See it in action here:

pankaj@pankaj-mac TP % cat inputfile.txt 
1~abc~select col1,col2,col3,col4 from <tablename> where <condition> group by~xyz
2~abc~select col1,col2,col3      from <tablename> where <condition> group by~xyq
3~abc~select col1,col2           from <tablename> where <condition> group by~xyg

pankaj@pankaj-mac TP % sed -e 's/select .* where //;s/ group by//' inputfile.txt > newfile.txt
pankaj@pankaj-mac TP % cat newfile.txt 
1~abc~<condition>~xyz
2~abc~<condition>~xyq
3~abc~<condition>~xyg

pankaj@pankaj-mac TP % 
Pankaj Saini
  • 1,164
  • 1
  • 5
  • 4
  • 1
    You might want to use a single regex with a back reference. Perhaps prefer `[^~]` to avoid letting the match straddle field boundaries. `sed 's/select [^~]* where \([^~]*\) group by[^~]*/\1/'` – tripleee Jul 16 '22 at 06:37
  • I tried Pankaj Saini solution and it worked thanks a lot for your time and effort – OrangeGallery Jul 17 '22 at 18:21
  • One additional stuff , how about if group by , order by have values as shown below. forgot to add in previous one 1~abc~select col1,col2,col3,col4 from where group by col1,col2,col3,col4 order by col1,col2,col3,col4~xyz 2~abc~select col1,col2,col3 from where group by col1,col2,col3 order by col1,col2,col3~xyq 3~abc~select col1,col2 from where group by col1,col2 order by col1,col2~xyg Need output like 1~abc~~xyz 2~abc~~xyq 3~abc~~xyg – OrangeGallery Jul 18 '22 at 08:43
  • Edit my response based on this additional input. Also the solution proposed by tripleee will work.. – Pankaj Saini Jul 18 '22 at 10:11
0

Using GNU Awk

$ awk -F'~| where | group ' '$0=$1"~"$2"~"$4"~"$NF' file 
$ awk -F'~| where | group ' -v OFS='~' '{print $1,$2,$4,$NF}' file
$ awk -F '~' -v OFS='~' 'gsub(/.* where | group by.*/,"",$3)' file
$ awk '$0=gensub(/(.*)select .* where (.*) group by(.*)/,"\\1\\2\\3",1)' file
$ awk 'match($0,/(.*)select.*where (.*) group by(.*)/,a){$0=a[1]a[2]a[3]}1' file 

1~abc~<condition>~xyz
2~abc~<condition>~xyq
3~abc~<condition>~xyg
ufopilot
  • 3,269
  • 2
  • 10
  • 12
-1

In the slightly more general case where the regexes could also match within some of the other fields, Awk might be a better choice than sed.

awk -F '~' '{
  sub(/.* where /, "", $3);
  sub(/ group by.*/, "", $3)
 }1' file >newfile

The final 1 is the common Awk idiom for "print the current input unconditionally".

This could also be accomplished with a while read -r loop in the shell, but you generally want to avoid that; see Bash while read loop extremely slow compared to cat, why?

tripleee
  • 175,061
  • 34
  • 275
  • 318