0

I have big list of domains names, like:

site.com
ns1.site.com
ns2.site.com
test.main.site.com
google.com
mail.google.com
etc.

The count of row around 10^9. I search best way to store this and find all subdomains by main domain.

For example, search query is:

site.com

Result will be:

ns1.site.com
ns2.site.com
test.main.site.com

Any ideas how to do it?

Thanks

Joe Brew
  • 11
  • 2
  • What language/tool are you using? If you have `grep` installed you can do `grep "site.com" inputfile`. – builder-7000 May 08 '18 at 21:18
  • I am searching for best t language/tool for this task. Grep is too slow for me. I have ~ 10^9 lines. – Joe Brew May 09 '18 at 08:25
  • You can speed up grep with the techniques mentioned in this post: https://stackoverflow.com/questions/13913014/grepping-a-huge-file-80gb-any-way-to-speed-it-up – builder-7000 May 09 '18 at 08:28

1 Answers1

0

You can use some real-time full-text search tool in which you store each domain name separately. Then you can run LIKE query with the given input, and it will return all domain names that contain your input string in them.

Some popular real-time full-text search engines are Apache Solr and ElasticSearch. Both should satisfy your conditions.

MacakM
  • 1,804
  • 3
  • 23
  • 46