I have this shell script:
AVAIL_REMOVAL=$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt) | sed -i "/$AVAIL_REMOVAL/d" $HOME/dcheck/files/domains.txt
$HOME/dcheck/files/available.txt
unregistereddomain1.com available 15/12/28_14:05:27
unregistereddomain3.com available 15/12/28_14:05:28
$HOME/dcheck/files/domains.txt
unregistereddomain1
registereddomain2
unregistereddomain3
I want to remove unregistereddomain1
and unregistereddomain3
lines from domains.txt
. How is it possible?
Also, is there a faster solution than grep? This benchmark showed that grep needed the most time to execute: Deleting lines from one file which are in another file
EDIT:
This works with one line files, but not multiline:
sed -i "/$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt)/d" $HOME/dcheck/files/domains.txt
EDIT 2:
Just copy here to have a backup. This solution needed for a domain checker bash script which if terminating some reason, at the next restart, it will remove the lines from the input file:
grep -oPa --no-filename '^.*(?=(\.com))' $AVAILABLE $REGISTERED > $GREPINPUT \
&& awk 'FNR==NR { a[$0]; next } !($0 in a)' $GREPINPUT $DOMAINS > $DOMAINSDIFF \
&& cat $DOMAINSDIFF > $DOMAINS \
&& rm -rf $GREPINPUT $DOMAINSDIFF
Most of the domain checker scripts here trying to solve this removel at the end of the script. But what they do not think about what's happening when the script terminated to run and there's no graceful shutdown? Than it will check again every single line from the input file, including the ones that are already checked... This one solves this problem. This way the script (with proper service management, like docker-compose, systemd, supervisord) can run for years from millions of millions size list files, until it will totally eat up the input file!