1

I have file1 with records that I want to find and replace with # in file2 and redirect the output to file3. I want to translate only the alphanumeric characters in file2. With the below code I'm not able to get the expected output. What am I doing wrong?

file_read=`cat file2`
while read line; do
  var=`echo $line | tr '[a-zA-Z0-9]' '#'`
  rep=`echo $file_read | awk "{gsub(/$line/,\"$var\"); print}"`
done < file1
echo file2 > file3

cat file1

2001009
@vanti Finserv Co.
2001009
Fund #1
11:11 - Capital
MS&CO(NY)
American Friends Org, Inc. 12X32
Domain-Name (LLC)
MS&CO(NY)
MS&CO(NY)
Ivy/Estate Rd
E*Trade wholesale

cat file2

<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
 <td>Rec1</td>
 <td>Rec2</td>
 <td>Rec3</td>
 <td>Rec4</td>
 <td>Rec5</td>
 <td>Rec6</td>
 <td>Rec7</td>
 <td>Rec8</td>
</tr>
<tr class="data">
<td>@vanti Finserv Co.</td>
<td>11:11 - Capital</td>
<td>MS&CO(NY)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>E*Trade wholesale</td>
<td>Domain-Name (LLC)</td>
<td>Ivy/Estate Rd</td>
<td></td>
</tr>
<tr class="data">
<td>@vanti Finserv Co.</td>
<td></td>
<td>MS&CO(NY)</td>
<td>2</td>
<td>2</td>
<td>MS&CO(NY)</td>
<td>MS&CO(NY)</td>
<td>Ivy/Estate Rd</td>
</table>
</body>
</html>

expected output cat file3

<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
 <td>Rec1</td>
 <td>Rec2</td>
 <td>Rec3</td>
 <td>Rec4</td>
 <td>Rec5</td>
 <td>Rec6</td>
 <td>Rec7</td>
 <td>Rec8</td>
</tr>
<tr class="data">
<td>@##### ####### ##.</td>
<td>##:## - #######</td>
<td>##&##(##)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>#*##### ########</td>
<td>######-#### (###)</td>
<td>###/###### ##</td>
<td></td>
</tr>
<tr class="data">
<td>@##### ####### ##.</td>
<td></td>
<td>##&##(##)</td>
<td>2</td>
<td>2</td>
<td>##&##(##)</td>
<td>##&##(##)</td>
<td>###/###### ##/td>
</table>
</body>
</html>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Roshni
  • 54
  • 8
  • 1
    Please share what you have tried, and what errors do you hit. SO is NOT a "we will just do your task" website/community – Ron Mar 27 '22 at 08:12
  • In your last question you asked to only convert the special symbols, now you want to replace alphanumeric characters but you would (if your regular expression wouldn't contain unescaped characters) actually replacing every character in your file, except for `:`, with `#`. Have a look at [your expression on regex101](https://regex101.com/r/9jJxX4/1). The errors get highlighted in red and explained. – mashuptwice Mar 27 '22 at 08:44
  • What is `file_read=cat file2` supposed to mean? This sets the environment variable `file_read` to `cat`, then tries to execute `file2` as a program. Did you mean `file_read=$(cat file2)`? But you never use the variable `$file_read`. – Barmar Mar 27 '22 at 09:21
  • Don't substitute shell variables directly into the `awk` script. See https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script – Barmar Mar 27 '22 at 09:23
  • Where do you set the `file` variable used in `echo $file`? – Barmar Mar 27 '22 at 09:24
  • You can't have spaces around the `=` in variable assignments like `var =` and `rep =` – Barmar Mar 27 '22 at 09:24

2 Answers2

0

You seem to be looking for something like

awk 'NR==FNR {
  regex = $0;
  gsub(/[][(){}|\\*+?.^$]/, "\\\\&", regex);
  a[++n] = regex;

  gsub(/[A-Za-z0-9]/, "#");
  gsub(/&/, "\\\\&");
  b[n] = $0;

  next
}
{ for(i=1;i<=n;++i)
    gsub(a[i], b[i])
} 1' file1 file2 >file3

In brief, we populate the array a with the phrases from file1, and b with the corresponding replacement strings. The condition FNR==NR will be true for the first input file; we then fall through to the rest of the script, which simply replaces any strings from a with the corresponding string from b, and prints all the lines.

The code is complicated somewhat by the escaping of regex metacharacters in a and further by the fact that & in the replacement string needs to be escaped, too (& alone recalls the matched text).

Demo: https://ideone.com/YkAkAZ

You generally want to avoid while read loops in the shell; Awk is much faster and more idiomatic when you want to perform some transformation on all lines in a file.

As a further aside, please try http://shellcheck.net/ before asking for human assistance. Even after you fixed syntax errors pointed out in comments, your attempt contains common beginner errors such as broken quoting.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thanks for your answer but this doesn't seem to help in case of records like Domain-Name (LLC) or MS&CO(NY) – Roshni Mar 27 '22 at 10:48
  • Thanks for the feedback; updated with a more elaborate version with a demo. – tripleee Mar 27 '22 at 11:35
  • Many thanks. I will keep that in mind:) – Roshni Mar 27 '22 at 12:28
  • Perhaps see also https://stackoverflow.com/questions/65538947/counting-lines-or-enumerating-line-numbers-so-i-can-loop-over-them-why-is-this - yours is not an example of that particular antipattern, but the pretzel logic in your attempt has many semblances to several related beginner approaches. – tripleee Mar 27 '22 at 12:32
0

Would you please try the following:

awk '
    NR==FNR {s = $0; gsub("[[:alnum:]]", "#"); a[s] = $0; next}
    {
        if (match($0, ">[^<]+")) {
            str = substr($0, RSTART+1, RLENGTH-1)
            if (str in a) {
                $0 = substr($0, 1, RSTART) a[str] substr($0, RSTART+RLENGTH)
            }
        }
    }
1 ' file1 file2 > file3

It assumes the strings to be replced are enclosed with tags but will work with the shown example.

tshiono
  • 21,248
  • 2
  • 14
  • 22