-1

So, I have a list of http:// adresesses and I need to count domains by regular expressions in JS. I don't have idea how to do that, as they have different length and some are similar to each other. How can I achieve that? Regular expressions are my nightmare. here is my list

Calle Dybedahl
  • 5,228
  • 2
  • 18
  • 22

2 Answers2

0

Using modified regular expression from this thread What is a good regular expression to match a URL? you can count the number of matches like this:

// Your original list of addresses
const data = `
http://www.gaba.ch/fr_CH/519/Netuschil-L-et-al-Eur-J-Oral-Sci-103-1995-355-361.htm?Subnav2=ResearchProducts&Article=17516
http://www.gaba.fi/fi_FI/725/Suche.htm?Page=42
http://www.gaba.ch/fr_CH/538/Recomend-Page.htm?LinkID=576&Brand=meridolHalitosis&Subnav=&Product=312435
http://www.gaba.com/en/1071/Professor-Edwin-G-Winkel.htm
http://www.gaba.ch/fr_CH/580/Congress-Calendar.htm?CongressId=289461&Page=6
// ... etc
`;

// Make sure you include the g flag to find all the matches and not just one
const addresses = data.match(/https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&//=]*)/g);

// Get length of the matched array
// - In this example: 5
// - In your case: 4815
const addressesCount = addresses.length;

EDIT:

Based on your comment I made few adjustments to the code:

// Your original list of addresses
const data = `
http://www.gaba.ch/fr_CH/519/Netuschil-L-et-al-Eur-J-Oral-Sci-103-1995-355-361.htm?Subnav2=ResearchProducts&Article=17516
http://www.gaba.fi/fi_FI/725/Suche.htm?Page=42
http://www.gaba.ch/fr_CH/538/Recomend-Page.htm?LinkID=576&Brand=meridolHalitosis&Subnav=&Product=312435
http://www.gaba.com/en/1071/Professor-Edwin-G-Winkel.htm
http://www.gaba.ch/fr_CH/580/Congress-Calendar.htm?CongressId=289461&Page=6
// ... etc
`;

// Find all valid domains (excluding http and www)
const addresses = data.match(/https?:\/\/(?:www)?\.((?:.+?)\.[\w\.]{2,5})/g);

// Filter the addresses to only unique ones
const unique = addresses.reduce((acc, cur) => acc.indexOf(cur) > -1 ? acc : acc.concat(cur), []);

// Get number of unique addresses found
// - In this example: 3
// - In your case: 28
const length = unique.length;

Note: addresses like this http:/www.bnf.org/bnf/bnf/54/%3C won't be matched because they are not valid.

user3210641
  • 1,565
  • 1
  • 10
  • 14
  • Oh, that is a good idea, but by domains I meant for example: http://gaba.ch http://gaba.fi Not number of urls. So, these are two different domains, but if one of them is present again somwhere, there are still only two domains. How can I modify this code? And thank you for your answer! – Natalia Kisiel Jun 04 '18 at 19:31
  • Thank you, I think it's working! Could you please help me understand what happened here? const addresses = data.match(/https?:\/\/(?:www)?\.((?:.+?)\.[\w\.]{2,5})/g); - I mean why {2,5} ? // Filter the addresses to only unique ones const unique = addresses.reduce((acc, cur) => acc.indexOf(cur) > -1 ? acc : acc.concat(cur), []); - I don't understand that line. But thank you so much! – Natalia Kisiel Jun 07 '18 at 12:27
-1

You can use String.prototype.match() method.

mohiris
  • 55
  • 6