0

I scrape sites for a database with a chrome extension, need assitance with a JavaScript Clean up function

e.g

https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p

my target output is:

_60789694386.html

everything past .html needs to be removed, but since it is diffrent in each URL - i'm lost

the output is in a .csv file, in which i run a JavaScript to clean up the data.

   this.values[8] = this.values[8].replace("https://www.alibaba.com/product-detail/","");

this.values[8] is how i target the column in the script. (Column 8 holds the URL)

Thomas
  • 37
  • 11
  • 1
    you could split at `?` and get the first bit. `yourString.split('?')[0]` ? – iacobalin Mar 15 '19 at 13:41
  • This could be helpful **[Remove query string]** [https://stackoverflow.com/remove-querystring-from-url ](https://stackoverflow.com/questions/2540969/remove-querystring-from-url) – ejaz Mar 15 '19 at 13:49

7 Answers7

3

Well, you can use split.

var final = this.values[8].split('.html')[0]

split gives you an array of items split by a string, in your case'.html', then you take the first one.

Mickael B.
  • 4,755
  • 4
  • 24
  • 48
Raul Rene
  • 10,014
  • 9
  • 53
  • 75
1

Consider using substr

this.values[8] = this.values[8].substr(0,this.values[8].indexOf('?'))
Gwyn Rees
  • 11
  • 3
0

You can use split method to divide text from ? as in example.

var link = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
var result = link.split('?')[0].replace("https://www.alibaba.com/product-detail/","");
console.log(result);
0

Not sure i understood your problem, but try this

var s = 'https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p'
s = s.substring(0, s.indexOf('?'));
console.log( s );
Hassan ALAMI
  • 318
  • 1
  • 5
  • 18
0

For when you don't care about readability...

this.values[8] = new URL(this.values[8]).pathname.split("/").pop().replace(".html","");
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
0

Alternate, without using split

var link = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
var result = link.replace('https://www.alibaba.com/product-detail/', '').replace(/\?.*$/, '');
console.log(result);
User863
  • 19,346
  • 2
  • 17
  • 41
0

You can use the regex to get it done. As of my knowledge you do something like:

    var v = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
    result = (v.match(/[^\/]+$/)[0]);
    result = result.substring(0,result.indexOf('?'));
    console.log(result);    // will return _60789694386.html