2

I'm trying to web scrape the first headline from the yahoo finance press release page, so in this case: "Nutriband Inc. Signs Exclusive Manufacturing Agreement for Diocheck(TM) Visual COVID-19 Antibody Indicator Patch" from the URL: https://finance.yahoo.com/quote/NTRB/press-releases

the formula:

=IMPORTXML("https://finance.yahoo.com/quote/NTRB/press-releases", "//*[@id="summaryPressStream-0-Stream"]/ul/li[1]/div/div/div[1]/h3/a/text()")

gives me a parsing error, which I'm guessing is from the double quotation marks around "summaryPressStream-0-Stream"

Replacing the double quotations with single quotations,

i.e. =IMPORTXML("https://finance.yahoo.com/quote/NTRB/press-releases", "//*[@id='summaryPressStream-0-Stream']/ul/li[1]/div/div/div[1]/h3/a")

results in a resource not found at URL error. Maybe I have the xpath wrong? I've tried all the xpath's near that section but just can't seem to get it working.

Does anyone know how to solve this issue?

Rubén
  • 34,714
  • 9
  • 70
  • 166

2 Answers2

1

When I tested =IMPORTXML("https://finance.yahoo.com/quote/NTRB/press-releases","//*"), #N/A is returned. Unfortunately, it seems that the HTML data cannot be retrieved. But, fortunately, I noticed that when Google Apps Script is used, the HTML data could be retrieved. So in this answer, as a workaround, I would like to propose to use Google Apps Script as the custom function. The sample script is as follows.

Sample script:

Please copy and paste the following script to the script editor and save the project. And, please put a formula of =SAMPLE("https://finance.yahoo.com/quote/NTRB/press-releases") to a cell. This script is used as a custom function. By this, the value of Nutriband Inc. Signs Exclusive Manufacturing Agreement for Diocheck(TM) Visual COVID-19 Antibody Indicator Patch is returned.

function SAMPLE(url) {
  var html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
  if (!html || html.length == 1) return;
  var obj = JSON.parse(html[1].trim());
  var res = obj.context.dispatcher.stores.StreamStore.streams["YFINANCE:NTRB.mega"].data.stream_items[0].title;
  return res || "No value";
}

Result:

When this script is used, the following result is obtained.

enter image description here

Note:

  • In this sample script, I think that all URLs might not be able to be used. When the URL is changed, this script might not be able to be used. Please be careful about this.

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
1

If you want to get a quick overview of the press releases, you can try

function pressReleases(code){
  var url = 'https://finance.yahoo.com/quote/'+code+'/press-releases'
  var source = UrlFetchApp.fetch(url).getContentText()
  var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
  var data = JSON.parse(jsonString)
  var result=[]
  eval("data.context.dispatcher.stores.StreamStore.streams['YFINANCE:"+code+".mega'].data.stream_items").forEach(function(item){
    result.push([item.title,item.publisher,item.summary])
  })
  return (result)
}

and then, put in a cell :

=pressReleases(A1)

with A1 = NTRB according to your requirement.

Mike Steelson
  • 14,650
  • 2
  • 5
  • 20
  • 1
    Hi, I wrapped the cell formula into a index function to extract the information I need as follows =INDEX(pressReleases(A1), 1, 1) Thank you for your help, it helped immensely – DarkWingDuck Oct 13 '21 at 03:34