1

I have these files

  • NotRequired.txt (having lines which need to be remove)
  • Need2CleanSED.txt (big file , need to clean)
  • Need2CleanGRP.txt (big file , need to clean)

content:

more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]

I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.

myFile="NotRequired.txt"

while IFS= read -r HKline

do

  sed -i '/$HKline/d' Need2CleanSED.txt

done < "$myFile"


myFile="NotRequired.txt"

while IFS= read -r HKline

do

  grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt

done < "$myFile"

Looks as if the Variable and characters [] making some problem.

Allan
  • 12,117
  • 3
  • 27
  • 51
Junipar70
  • 3
  • 4
  • You have to use double quotes to dereference variables: `sed -i "/$HKline/d" Need2CleanSED.txt`. – Jack Apr 16 '19 at 11:27

3 Answers3

3

What you're doing is extremely inefficient and error prone. Just do this:

grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt

Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.

Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks for advice, just want to be clear, me seeing 2 lines, if these are really 2 lines, what is meaning of && in first line. If its whole one line command then its Okay (second line will execute after the first). && = AND operator – Junipar70 Apr 17 '19 at 10:55
  • The `&&` means `execute the 2nd command only if the first one succeeds`. Without that if the `grep` failed then the `mv` would be replacing your `Need2CleanGRP.txt` file with an empty `tmp` file. – Ed Morton Apr 17 '19 at 12:47
0

Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:

sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"

If you don't like that, you can probably put the sed command in where you're directing the file in:

done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'

Finally, you can use sed on each HKline variable:

HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )
Jack
  • 5,801
  • 1
  • 15
  • 20
  • Thanks Jack/Bummi...its resolved (i have edited my original file NotRequired.txt with \\[abc-xyz_pqr-pe2_123] (added slash \ in each line as as first character) and sed -i -e "/$HKline/d" Need2CleanSED.txt – Junipar70 Apr 16 '19 at 13:24
  • See https://stackoverflow.com/q/29613304/1745001 for some of the other chars and strings you also need to be careful of with this approach and also be wary of any globbing chars or spaces. – Ed Morton Apr 16 '19 at 23:06
0

try gnu sed:

sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt

Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;

/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d

add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

  • can you please explain above and reply to specific question using these files name (NotRequired.txt , Need2CleanSED.txt) will help in future for other users too. – Junipar70 Apr 17 '19 at 11:01
  • Thanks Abdan, just one clarification, if NotRequired.txt having 30000+ lines, similar pattern, will handle , will execute successfully ? or need help from : xargs command – Junipar70 Apr 18 '19 at 05:53
  • GNU sed version 4.2.1 ,, sed: invalid option -- 'z' (I USED => sed -Ezu rest is same as you mentioned in very first line) – Junipar70 Apr 29 '19 at 11:25