-1

Consider a string

let a =  "I visit google.com often times but.. not amazon.uk"

How to extract google.com and amazon.uk from the string above in JavaScript

Lahfir
  • 145
  • 2
  • 15
  • `[a-zA-Z0-9]+\.[a-zA-Z0-9]{2,}` might do the trick for most sites. but i strongly against this kind of approach only - its very inaccurate. you should try to capture the second group and test it against [known list of tld](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains). also, if you take a look on the rfc (forgot the exact number) for domain names, you will find that entire unicode (non-modern latin alphabet) is valid. cmiiw. – Bagus Tesa Jun 22 '22 at 14:56
  • this [QA regarding regex for capturing url](https://stackoverflow.com/q/3809401) is a nice start. it would be best if you could: 1) check valid tlds; 2) check if the actual site is on DNS record. – Bagus Tesa Jun 22 '22 at 14:59
  • added a solution, does it address your question? – Naveed Jun 22 '22 at 21:19
  • @Naveed Thanks for your solution but it solves only if there is .com or .uk I want to take all the urls even if it contains some other domain extension – Lahfir Jun 23 '22 at 08:58
  • @Lahfir, you an add those domains here delimited with the pipe and it will work (.uk|.com). – Naveed Jun 23 '22 at 13:10
  • we would either needs to know a pattern to identify these as domains or a list of the domains we want to search. the Solution presented with work when you have a list of domains already identified and answers the question from that standpoint – Naveed Jun 23 '22 at 13:35

3 Answers3

0

Try this :

let a =  "I visit google.com often times but.. not amazon.uk"
a.match(/("[^"]+"|[^"\s]+)/g);

Output:

[
    "I",
    "visit",
    "google.com",
    "often",
    "times",
    "but..",
    "not",
    "amazon.uk"
]
yanir midler
  • 2,153
  • 1
  • 4
  • 16
  • Thanks for the answer but what if there is a domain with some other extension .io or something? Do you suggest to store the list of extensions in an array and compare with that? – Lahfir Jun 22 '22 at 14:48
  • I think you need write a custom parser for it – Shkar Sardar Jun 22 '22 at 14:50
0

Here is one way to do it

\s(\w+)(.uk|.com)\b

here is a fiddle link for Javascript

https://jsfiddle.net/y25wz3ae/

enter image description here

https://regex101.com/r/HFyxEJ/1

Result [('google', '.com'), ('amazon', '.uk')]

Naveed
  • 11,495
  • 2
  • 14
  • 21
-1

To solve this problem I've created an API to extract URLs from a string or an array of strings

Base Url -> https://urlsparser.herokuapp.com/

GET https://urlsparser.herokuapp.com/url

For a single string

{
  "string" : "More here http://action.mySite.com/trk.php?mclic=P4CAB9542D7F151&urlrv=http%3A%2F%2Fjeu-centerparcs.com%2F%23%21%2F%3Fidfrom%3D8&urlv=517b975385e89dfb8b9689e6c2b4b93d text<br/>And more here http://action.mySite.com/trk.php?mclic=P4CAB9542D7F151&urlrv=http%3A%2F%2Fjeu-centerparcs.com%2F%23%21%2F%3Fidfrom%3D8&urlv=517b975385e89dfb8b9689e6c2b4b93d"
}

For an array of strings

{
  "string" : ["string1","string2"....]
}

Screenshot

2

Advantages

  1. Has more than 900 domain extensions [.com,.io,....]
  2. Faster, extracts result in less than 20ms
Lahfir
  • 145
  • 2
  • 15