3

I have one requirement.

I have one text file named as a.txt, which is having list of words -

GOOGLE
FACEBBOK

Now I have one another file named as b.txt , which is having content as

Company name is google.
Company name is facebook.

Like this n of lines are there with different different words.

Then I am writing script file -

    FILENAME="a.txt"

SCHEMA=$(cat $FILENAME)

for L in $SCHEMA
do
    echo "${L,,}"

sed -i -E "s/.+/\L&_/" b.txt
done

So after running script the output file of b.txt file I am expecting is

 Company name is google_
 Company name is facebook_

But the output after running that script I am getting is -

Company name is google.__
Company name is facebook.__

And this output will be saved in b.txt file as I mentioned in sed command

Note - In a.txt I am having the list of Words which I want to replace and in b.txt file I am having paragraphs of line in which I am having words like google. , facebook. and so on.

So that's why I am not able to give direct sed command for replacement.

I hope that you understand my requirement.

Thanks in advance!

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
saurabh704
  • 63
  • 6
  • 1
    Tangentially, [don't use upper case for your private variables.](https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization) – tripleee Oct 01 '20 at 10:29

4 Answers4

1

You can use the following GNU sed solution:

FILENAME="a.txt"
while IFS= read -r L; do
  sed -i "s/\($L\)\./\1_/gI" b.txt
done < $FILENAME

Or, the same without a loop as a single line (as used in anubhava's answer):

sed -i -f <(printf 's/\\(%s\\)\\./\\1_/gI\n' $(<"$FILENAME")) b.txt

With the script, you

  • while IFS= read -r L; do - read the file line by line, each line being assigned to L
  • sed -i "s/\($L\)\./\1_/gI" b.txt - replaces all occurrences of L (captured into Group 1 with the help of capturing \(...\) parentheses) followed with . (in a case insensitive way due to I flag) in b.txt with the same value as captured in Group 1 and _ appended to it.
  • -f allows passing a list of commands to sed
  • printf 's/\\(%s\\)\\./\\1_/gI\n' $(<"$FILENAME") creates a list of sed commands, in this case, it looks like
s/\(GOOGLE\)\./\1_/gI
s/\(FACEBOOK\)\./\1_/gI
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you @Wiktor Stribiżew for your reply. Actually after applying your script I am getting output as - google.__ and facebook.__ that is double "_" it is adding and its not eliminating "." also. Can you help me with this? Actually in a.txt I am having the list of words which I want to change in b.txt file. So applying for loop for every word. – saurabh704 Oct 01 '20 at 10:11
  • @saurabh704 Why eliminate `.` if there is no `.` in your input? Or did you share wrong input? Did you use `sed -i -E "s/.+/\L&_/"` or `sed -i "s/.*/\L&_/"`? If you used the latter, try `sed -i "s/..*/\L&_/"` – Wiktor Stribiżew Oct 01 '20 at 10:13
  • Thank you @Wiktor Stribiżew for your time. I have edited the question please see if you are able to understand my requirement. – saurabh704 Oct 01 '20 at 10:22
  • @saurabh704 See my new answer. Yes, after the edit, the question is much clearer. – Wiktor Stribiżew Oct 01 '20 at 10:25
1

Here is how you can do it in a single shell command without any loop using gnu-sed with printf in a process substitution:

sed -i -E -f <(printf 's/\\b(%s)\\./\\1_/I\n' $(<a.txt)) b.txt

cat b.txt
Company name is google_
Company name is facebook_

This would be far more efficient than running sed or awk in a loop esp if input files are big in size.

  • printf command is creating a sed command script that looks like this:
s/\b(GOOGLE)\./\1_/I
s/\b(FACEBOOK)\./\1_/I
  • sed -f runs that dynamically generated script
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

With a single awk reading 2 Input_files could you please try following.

awk '
FNR==NR{
  a[tolower($0)]
  next
}
($(NF-1) in a){
  sub(/\.$/,"")
  print $0"_"
}
' a.txt FS="[ .]" b.txt

Explanation: Adding detailed explanation for above solution.

awk '                        ##Starting awk program from here.
FNR==NR{                     ##Checking condition FNR==NR which will be TRUE when a.txt is being read.
  a[tolower($0)]             ##Creating array a with index of current line in lower case from a.txt here.
  next                       ##next will skip all further statements from here.
}
($(NF-1) in a){              ##Checking condition if 2nd last field is present in array a then do following.
  sub(/\.$/,"")              ##Substituting last DOT with NULL here.
  print $0"_"                ##Printing current line with _ here.
}
' a.txt FS="[ .]" b.txt      ##Mentioning a.txt and setting field separator as space and . for b.txt here.


2nd solution: Adding 1 more solution with awk here.

awk '
FNR==NR{
  a[tolower($0)]
  next
}
{
  sub(/\.$/,"")
}
($NF in a){
  print $0"_"
}
' a.txt b.txt
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

This might work for you (GNU sed):

sed 's#.*#s/(&)./\\1_/Ig#' a.txt | sed -i -Ef - b.txt

N.B. The match is case insensitive because of the I flag on the substitution command, however the replacement is from the original file i.e. if the original string is google the match is case insensitive to GOOGLE and replaced by google_.

potong
  • 55,640
  • 6
  • 51
  • 83