How to filter domain in content?

Question

How I can filter domain in content?

For example.... I have some text content, like this:

dropwox.com N/A     $ 8.95  1 day ago
lute.info   N/A     $ 8.95  1 week ago
zolpidem4sleep.com  N/A     $ 8.95  1 week ago
youredmedsinfo.com  N/A     $ 8.95  1 week ago
youngsmhs.com   N/A     $ 8.95  1 week ago
jsntcj.com  N/A     $ 8.95  1 week ago
fioricetdirect2k.com    13,133,796      $ 8.95  1 week ago
dapoxetinebuynow.com    N/A     $ 8.95  1 week ago
86620000.com    N/A     $ 8.95  1 week ago
spidvid.com 1,884,910       $ 480.00    1 week ago
titsforall.com  20,318,475      $ 8.95  1 week ago

and I just need to filter the domains and see this list like:

dropwox.com
lute.info
zolpidem4sleep.com
youredmedsinfo.com
youngsmhs.com

Is any tool or online converter for do this work?

Help me

I don't see how this is related to dns. Also, any preference for operating systems? On linux it's pretty easy. — fvu, Jul 11 '15 at 21:04

score 0 · Answer 1 · edited May 23 '17 at 12:06

0

If a shell solution is OK, you can do something like this:

cut -d' ' -f1 file | sort | uniq

getting the first word, here using cut, but there are several other ways
sort them so that ...
uniq can filter out the doubles

edited May 23 '17 at 12:06

Community

1
1

answered Jul 11 '15 at 21:09

fvu

32,488
6
61
79

Any text editor or some specific one? Look [here](http://superuser.com/questions/520372/how-to-keep-only-the-first-word-in-a-line-using-notepad) for some solutions using Notepad++, probably usable as a start even if you're using another editor, provided that it supports replace based on regex. – fvu Jul 11 '15 at 21:54

score 0 · Answer 2 · answered Jun 19 '21 at 17:42

That is an old question, but why not answer for coming generations? If you use MacOS or Linux, there are a bunch of tools:

$ cat full_data.txt
dropwox.com N/A     $ 8.95  1 day ago
lute.info   N/A     $ 8.95  1 week ago
zolpidem4sleep.com  N/A     $ 8.95  1 week ago
...

You may use any of the following:

sed: removing everything after space:
$ sed 's/ .*//' full_data.txt > domains.txt

grep: with regular expression, everything from the beginning (^) until the first space :
$ grep -o "^\S\+" full_data.txt > domains.txt

cut: Pick a first part, space is a delimeter:
$ cut -d' ' -f1 full_data.txt > domains.txt

awk: my beloved awk — pick the first part, space is a delimiter, then printing it:
$ awk '{print $1}' full_data.txt > domains.txt

Also, Perl — same, taking first "variable" line by line :
$ perl -lane 'print $F[0]' full_data.txt > domains.txt

How to filter domain in content?

2 Answers2