1

I have a text file (capital_names.txt) containing lines like these:

Warsaw_  
London_  
Oslo_  
...

In another file (capital_info.txt) I have the following lines:

London_1_  
London_2  
cityLondon_3  
capitalWarsaw_1  
Warsaw_2  
...

I wanted to write a shell script that greps capital names only if they are in the following format "Name_".
Desired outputs are multiple files like these:

$ cat Warsaw_output.txt  
Warsaw_2

$ cat London_output.txt   
London_1  
London_2  

Here is the key part of the script:

$outp=$"output"  
while read line; do  
grep ^$line capital_info.txt > $line$outp  
done < capital_names.txt

However, the output files are empty (0 bytes) and have the following names:

'Warsaw_$'\r''output'  
'London_'$'\r''output'

When I run individual commands (grep -f ^"London_" capital_info.txt) everything works but I cannot do it for 50000 entries in capital_names.txt manually. How can I solve this issue?

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
duda13
  • 41
  • 6
  • the `\r` characters are windows/dos line endings; consider removing these from your file(s) (eg, `dos2unix filename`) and then run your script again – markp-fuso Mar 09 '22 at 16:28
  • 1
    There are many problems with the example code. Use [Shellcheck](https://www.shellcheck.net/) to find them, and how to fix them. – pjh Mar 09 '22 at 18:18
  • 1
    See [Are shell scripts sensitive to encoding and line endings?](https://stackoverflow.com/q/39527571/4154375) and [How to convert Windows end of line in Unix end of line (CR/LF to LF)](https://stackoverflow.com/q/3891076/4154375). – pjh Mar 09 '22 at 18:22

1 Answers1

2

With your shown samples and attempts, please try following awk code. Written and tested in GNU awk, should work in any awk.

awk '
BEGIN  { FS=OFS="_" }
FNR==NR{
  arr[$1]
  next
}
($1 in arr) && $2~/^[0-9]+$/{
  outFile=($1"_output.txt")
  if(prev!=outFile){ close(prev) }
  print ( $1,$2 ) > (outFile)
  prev=outFile
}
' capital_names.txt capital_info.txt

Explanation: Adding detailed explanation for above.

awk '                                  ##Starting awk program from here.
BEGIN  { FS=OFS="_" }                  ##In BEGIN section of awk setting FS and OFS as _ here.
FNR==NR{                               ##Checking condition FNR==NR then do following.
  arr[$1]                              ##Creating array arr with index of $1.
  next                                 ##next will skip all further statements from here.
}
($1 in arr) && $2~/^[0-9]+$/{          ##Checking if $1 is in arr AND 2nd field is digits.
  outFile=($1"_output.txt")            ##Creating outFile which has output file name in it.
  if(prev!=outFile){ close(prev) }     ##Checking if previous output file name is NOT same as current output file name then close previous one, to avoid too many open files error.
  print ( $1,$2 ) > (outFile)          ##printing 1st and 2nd field to outFile here.
  prev=outFile                         ##Setting prev to outFile value here.
}
' capital_names.txt capital_info.txt   ##Mentioning Input_file names here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • There's a slight problem: closed files will get truncated during the next write. See the content of `test.txt` after running `awk 'BEGIN{f="test.txt"; print "A" > f; close(f); print "B" > f; close(f)}'`. using `>>` would be better (and might require the truncating of the files in `FNR==NR`) – Fravadona Mar 09 '22 at 16:28
  • @Fravadona, files will be closed only when last and current output file names are not same. If I get your comment right here. – RavinderSingh13 Mar 09 '22 at 16:42
  • 1
    You're probably right by doing it like that; in the provided sample there's only two continuous `Londons_XXX`, so nothing suggests there exists discontinuous entries in `capital_info.txt` – Fravadona Mar 09 '22 at 16:54
  • Thanks for your feedback and detailed explanation. Unfortunately when I put your code in script.awk (with `#!/usr/bin/awk -f` as top line) and make it executable, I get the following errors: `awk: 3: unexpected character ''' awk: 15: unexpected character '''` when I run it from command line – duda13 Mar 10 '22 at 10:36
  • @duda13, no you need not to run this as `#!/usr/bin/awk -f` put this whole code starting from `awk` into a shell script and then run it, it should fly then. – RavinderSingh13 Mar 10 '22 at 10:39