How to read lines from one file, find them in a html file and mask them in unix?

Question

I have a file1 with records coming from my db that I want to find and replace with # in file2 and redirect the output to file3. I want to translate only the alphanumeric characters in file2. With the below code I m not able to get the expected output. What am I doing wrong? Please help!!

code

file_read=cat file2
while read line; do
var = `echo $line | tr '[a-zA-Z0-9]' '#'`
rep = `echo $file | awk "{gsub(/$line/,\"$var\"); print}"`
done < file1
echo file2 > file

cat file1

2001009
@vanti Finserv Co.
2001009
Fund #1
11:11 - capital
MS&CO(NY)
American Friends Org, Inc. 12X32
Domain-Name (LLC)
MS&CO(NY)
MS&CO(NY)
Ivy/Estate Rd
E*Trade wholesale

cat file2

<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
 <td>Rec1</td>
 <td>Rec2</td>
 <td>Rec3</td>
 <td>Rec4</td>
 <td>Rec5</td>
 <td>Rec6</td>
 <td>Rec7</td>
 <td>Rec8</td>
</tr>
<tr class="data">
<td>@vanti Finserv Co.</td>
<td>11:11 - Capital</td>
<td>MS&CO(NY)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>E*Trade wholesale</td>
<td>Domain-Name (LLC)</td>
<td>Ivy/Estate Rd</td>
<td></td>
</tr>
<tr class="data">
<td>@vanti Finserv Co.</td>
<td></td>
<td>MS&CO(NY)</td>
<td>2</td>
<td>2</td>
<td>MS&CO(NY)</td>
<td>MS&CO(NY)</td>
<td>Ivy/Estate Rd</td>
</table>
</body>
</html>

expected output cat file3

<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
 <td>Rec1</td>
 <td>Rec2</td>
 <td>Rec3</td>
 <td>Rec4</td>
 <td>Rec5</td>
 <td>Rec6</td>
 <td>Rec7</td>
 <td>Rec8</td>
</tr>
<tr class="data">
<td>@##### ####### ##.</td>
<td>##:## - #######</td>
<td>##&##(##)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>#*##### ########</td>
<td>######-#### (###)</td>
<td>###/###### ##</td>
<td></td>
</tr>
<tr class="data">
<td>@##### ####### ##.</td>
<td></td>
<td>##&##(##)</td>
<td>2</td>
<td>2</td>
<td>##&##(##)</td>
<td>##&##(##)</td>
<td>###/###### ##/td>
</table>
</body>
</html>

I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...) and not awk. — Cyrus, Mar 19 '22 at 03:48
Hi @Cyrus , that's not an option I have. I need to use awk for this. — Roshni, Mar 19 '22 at 04:09
In your example do you want to "mask" any `CDX98XSD` substring in the second file or only `CDX98XSD`? If the latter can you also have `...` tags? Are they always on the same line or can they be split on several consecutive lines? When the strings to mask contain spaces do you want an exact string equality or do you consider that any number of spaces is the same as one? Wouldn't it be simpler to put regular expressions instead of strings in the first file? Please answer by editing your question, not in comments. — Renaud Pacalet, Mar 19 '22 at 07:38

score 0 · Answer 1 · answered Mar 19 '22 at 05:05

0

You can iterate over each line in first file, then use awk or sed to replace each occurrece

#!/bin/bash -

file2content=`cat file2`
while read line; do
  mask=`echo $line | tr '[a-zA-Z0-9 ]' '#'`
  file2content=`echo "$file2content" | awk "{gsub(/$line/,\"$mask\"); print}"`
  # you can also use sed for replacing file like this commented line below:
  # file2content=`echo "$file2content" | sed "s/$line/$mask/g"`
done < file1
echo "$file2content"

Output:

<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
 <td>ID</td>
 <td>Name</td>
 <td>Address</td>
 <td>City</td>
 <td>Code</td>
 <td>Account Name</td>
 <td>Phone</td>
 <td>Country</td>
</tr>
<tr class="data">
<td>#######</td>
<td>##########</td>
<td>#############</td>
<td>New York</td>
<td>########</td>
<td>00003458</td>
<td>###############</td>
<td></td>
</tr>
</table>
</body>
</html>

answered Mar 19 '22 at 05:05

Kristian

2,456
8
23
23

1

This will also mask out tags if they are in file1 .. – Mr R Mar 19 '22 at 08:32
I used your code to get the expected output and its working fine, however I am not able to mask records like - Domain-Name (LLC). What am I missing ? – Roshni Mar 27 '22 at 08:07
maybe because it have parenthesis in it, which is a special character in regular expression – Kristian Mar 27 '22 at 09:29
how so I escape such characters? – Roshni Mar 27 '22 at 09:44
try this: ```searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n')``` from https://stackoverflow.com/a/29613573/3706717 – Kristian Mar 27 '22 at 13:29

How to read lines from one file, find them in a html file and mask them in unix?

1 Answers1