0

I downloaded very huge list of hosts to block ads. The problem is some sites are broken its functionality, like forum/discussion and/or pics. So i wanna remove some sites in hosts file.

Let say I wanna remove a.com and b.com from hosts. These methods work.

grep -ve a.com -e b.com hosts > new_hosts

or

egrep -v 'a.com|b.com' hosts > new_hosts

Both are working fine. But if pattern increase, I wanna write the pattern in file. If I use this

grep -vf pattern.txt hosts > new_hosts

Only the last pattern will be removed. If pattern.txt contain

a.com
b.com

Only b.com omitted from new_hosts, a.com still written in new_hosts. So what grep command to use using pattern file?

kyrios
  • 1
  • 1
  • 1
    Check pattern.txt for special characters: `cat -A pattern.txt` or `cat -v pattern.txt`. – Cyrus Oct 22 '18 at 18:56
  • See this post for a detailed discussion of this topic, with solutions: [Fastest way to find lines of a file from another larger file in Bash](https://stackoverflow.com/q/42239179/6862601). – codeforester Oct 22 '18 at 19:54
  • You may want to use the command `grep -vFxf pattern.txt hosts > new_hosts` to make sure the content of your pattern.txt file is treated as strings, rather than regex to prevent `.` from being treated as a wildcard for example (`-F` option), and make sure we match the whole line (`-x` option). – codeforester Oct 22 '18 at 19:57
  • Possible duplicate of [Fast way of finding lines in one file that are not in another?](https://stackoverflow.com/questions/18204904/fast-way-of-finding-lines-in-one-file-that-are-not-in-another) – codeforester Oct 22 '18 at 20:00

2 Answers2

0

If you have a hosts file that you want to compare with another file containing entries you want to eliminate, this will be easier with uniq than with grep.

Just combine the files and run something like this:

cat hosts badfile badfile | sort | uniq -u > new_hosts

Badfile is added twice because if an entry is not already present in hosts, it will remain. Duplicating guarantees all copies are eliminated.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Kyle Banerjee
  • 2,554
  • 4
  • 22
  • 30
0

Thx for the feedback guys. Since most of you suspect the error from pattern.txt, then I suspect it could be windows notepad which made the error. New line from Windows notepad is terminated by 0D 0A (hex).

I read somewhere the new line for grep shoud be 0A (hex). After editing the pattern.txt using Notepad++, this command finally works :-)

grep -vf pattern.txt hosts > new_hosts

Or maybe this is better

fgrep -vf pattern.txt hosts > new_hosts

Both are working perfectly :-)

kyrios
  • 1
  • 1