3

I'm trying to extract domain name from string in 'tweets', how can I avoid to extract double backward slashes from string? the regular expression I have defined in let url

let tweets = [
  "Thank you to the Academy and the incredible cast & crew of #TheRevenant. #Oscars",
  "@HardingCompSci department needs student volunteers for #HourOfCode https://hourofcode.com/us",
  "Checkout the most comfortable earbud on #Kickstarter and boost your #productivity https://www.kickstarter.com/",
  "Curious to see how #StephenCurry handles injury. http://mashable.com/2016/04/25/steph-curry-knee-injury-cries-cried/"
];


let url = /\/\/.+?\.com?/;

tweets.forEach(function(tweet) {
  console.log(url.exec(tweet));
});
Nik
  • 1,589
  • 2
  • 15
  • 23
xxddd_69
  • 83
  • 1
  • 7
  • 1
    Does this answer your question? [Regular expression to find URLs within a string](https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string) – Yair Cohen Jan 07 '21 at 04:28

2 Answers2

1

Use a Capturing Group

A part of a pattern can be enclosed in parentheses (...). This is called a “capturing group”.

That has two effects:

It allows to get a part of the match as a separate item in the result array. If we put a quantifier after the parentheses, it applies to the parentheses as a whole.

In your code you have let url = /\/\/.+?\.com?/;

You are only interested in the part following the 2 slashes, so make a capturing group for that by enclosing it in braces: let url = /\/\/(.+?\.com?)/;

Then change the code in the loop a bit to get the result from the first capturing group and you end up with:

let tweets = [
  "Thank you to the Academy and the incredible cast & crew of #TheRevenant. #Oscars",
  "@HardingCompSci department needs student volunteers for #HourOfCode https://hourofcode.com/us",
  "Checkout the most comfortable earbud on #Kickstarter and boost your #productivity https://www.kickstarter.com/",
  "Curious to see how #StephenCurry handles injury. http://mashable.com/2016/04/25/steph-curry-knee-injury-cries-cried/"
];


let url = /\/\/(.+?\.com?)/;

tweets.forEach(function(tweet) {
  var match = url.exec(tweet)
  console.log(match && match[1] || match);
});
Stijn de Witt
  • 40,192
  • 13
  • 79
  • 80
  • I used `match && match[1] || match` to match the original code as close as possible. If it was my code, I would write `match && match[1]` and let the first iteration return undefined. – Stijn de Witt Jan 07 '21 at 04:39
0

Made a quick script for your query, using the new URL() constructor.

It splits your tweets by words and test them. When an URL is found, the urls array is populated.

let tweets = [
       "Thank you to the Academy and the incredible cast & crew of #TheRevenant. #Oscars",
       "@HardingCompSci department needs student volunteers for #HourOfCode https://hourofcode.com/us",
       "Checkout the most comfortable earbud on #Kickstarter and boost your #productivity https://www.kickstarter.com/",
       "Curious to see how #StephenCurry handles injury. http://mashable.com/2016/04/25/steph-curry-knee-injury-cries-cried/"
    ];
 
let urls = []
 
function getURL(me){
  me.split(" ").forEach(function(e){
    try { 
      new URL(e);
      console.log(e + " is a valid URL!")
      urls.push(e)
    } 
    catch (error){
      console.log(error.message);
    }
  })

}

tweets.forEach(function(tweet){
  getURL(tweet)
})

url.innerHTML = urls.join("<br>")
<div id="url"></div>
NVRM
  • 11,480
  • 1
  • 88
  • 87