0

I have this shell script:

AVAIL_REMOVAL=$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt) | sed -i "/$AVAIL_REMOVAL/d" $HOME/dcheck/files/domains.txt

$HOME/dcheck/files/available.txt

unregistereddomain1.com available   15/12/28_14:05:27
unregistereddomain3.com available   15/12/28_14:05:28

$HOME/dcheck/files/domains.txt

unregistereddomain1
registereddomain2
unregistereddomain3

I want to remove unregistereddomain1 and unregistereddomain3 lines from domains.txt. How is it possible?

Also, is there a faster solution than grep? This benchmark showed that grep needed the most time to execute: Deleting lines from one file which are in another file

EDIT:

This works with one line files, but not multiline:

sed -i "/$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt)/d" $HOME/dcheck/files/domains.txt

EDIT 2:

Just copy here to have a backup. This solution needed for a domain checker bash script which if terminating some reason, at the next restart, it will remove the lines from the input file:

grep -oPa --no-filename '^.*(?=(\.com))' $AVAILABLE $REGISTERED > $GREPINPUT \
&& awk 'FNR==NR { a[$0]; next } !($0 in a)' $GREPINPUT $DOMAINS > $DOMAINSDIFF \
&& cat $DOMAINSDIFF > $DOMAINS \
&& rm -rf $GREPINPUT $DOMAINSDIFF

Most of the domain checker scripts here trying to solve this removel at the end of the script. But what they do not think about what's happening when the script terminated to run and there's no graceful shutdown? Than it will check again every single line from the input file, including the ones that are already checked... This one solves this problem. This way the script (with proper service management, like docker-compose, systemd, supervisord) can run for years from millions of millions size list files, until it will totally eat up the input file!

Community
  • 1
  • 1
Lanti
  • 2,299
  • 2
  • 36
  • 69
  • Removing lines which are in another file (entire line, prefix, anywhere, etc) is an extremely common task. If the duplicate I linked to doesn't contain a suitable answer, I'm sure there will be another. Feel free to nominate a better duplicate if you find one. – tripleee Dec 28 '15 at 18:48

1 Answers1

1

from man grep:

-f file
--file=file

   Obtain patterns from file, one per line. The empty file contains
   zero patterns, and therefore matches nothing. (-f is specified by POSIX.)

Regarding the speed: depending on regexp the performance may differ drastically. The one you use seems /suspicious/. The fixed lines matches are the fastest, almost always.

user3159253
  • 16,836
  • 3
  • 30
  • 56