0

My question: why does my perl script--successful via home laptop--not work when run in the context of my hosting website? (Perhaps they have a firewall, for example. Perhaps my website needs to provide credentials. Perhaps this is in the realm of cross-site scripting. I DON'T KNOW and appeal for your help in my understanding what could be the cause and then the solution. Thanks!)

Note that all works fine IF I run the perl script from my laptop at home.

But if I upload the perl script to my web host, where I have a web page whose javascript successfully calls that perl script, there is an error back from the site whose URL is in the perl script (finance.yahoo in this example).

To bypass the javascript, I'm just typing the URL of my perl script, e.g. http://example.com/blah/script.pl

Here is the full error message from finance.yahoo when $url starts with http:

Can't connect to finance.yahoo.com:80 nodename nor servname provided, or not known at C:/Perl/lib/LWP/Protocol/http.pm line 47.

Here is the full error message from finance.yahoo when $url starts with https:

Can't connect to finance.yahoo.com:443 nodename nor servname provided, or not known at C:/Perl/lib/LWP/Protocol/http.pm line 47.

Code:

#!/usr/bin/perl
use strict; use warnings;
use LWP 6; # one site suggested loading this "for all important LWP classes"

use HTTP::Request;

### sample of interest: to scrape historical data and feed massaged facts to my private web page via js ajax
my $url = 'http://finance.yahoo.com/quote/sbux/profile?ltr=1';

my $browser = LWP::UserAgent->new;

# one site suggested having this empty cookie jar could help
$browser->cookie_jar({});

# another site suggested I should provide WAGuess
my @ns_headers = (
 'User-Agent' => 
        # 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0',
 'Accept' => 'text/html, */*',
 'Accept-Charset' => 'iso-8859-1,*,utf-8',
 'Accept-Language' => 'en-US',
);

my $response = $browser->get($url, @ns_headers);

# for now, I just want to confirm, in my web page itself, that 
# the target web page's contents was returned
my $content = $response->content;

# show such content in my web page
print "Content-type: text/html\n\n" . $content;
shrimpwidget
  • 23
  • 1
  • 8

2 Answers2

1

Well it is not obvious what is your final goal and it is possible that you over complicate the task.

You can retrieve above mentioned page with simpler perl code

#!/usr/bin/env perl
#
# vim: ai:ts=4:sw=4
#

use strict;
use warnings;
use feature 'say';

use HTTP::Tiny;

my $debug = 1;

my $url = 'https://finance.yahoo.com/quote/sbux/profile?ltr=1';

my $responce = HTTP::Tiny->new->get($url);

if ($responce->{success}) {
    my $html = $responce->{content};

    say $html if $debug;
}

In your post you indicated that javascript is somehow involved -- it is not clear how and what it's purpose in retrieving of the page.

Error message has a reference to at C:/Perl/lib/LWP/Protocol/http.pm line 47 which indicates that web hosting is taking place on Windows machine -- it would be nice to indicate it in your message.

Could you shed some light on purpose of following block in your code?

# WAGuess
$browser->env_proxy;
# WAGuess
$browser->cookie_jar({});

I do not see cookie_jar be utilized in your code anywhere.

Do you plan to use some authentication approach to extract some data under your personal account which is not accessible otherwise?

Please state in a few first sentences what you try to achieve on grand scale.

Polar Bear
  • 6,762
  • 1
  • 5
  • 12
  • NOTE: if you have access into web hosting machine for assurance test connection to **finance.yahoo.com** on port 80 (with _telnet_ or _nc_). I doubt that web hosting machine is blocking traffic on firewall level but until you try, you will not know for sure. – Polar Bear Feb 06 '20 at 03:50
  • It may be, for example, that the OS of the hosting server is important for my goal (goal: scrape a web page with my perl script that is running on the hosting company's server). I have no idea what is important. I'm looking to helpful folks to mentor me in the right direction. – shrimpwidget Feb 06 '20 at 16:50
  • Proxy and user-agent, etc: These are the result of my searching for why the web page would not work while my local computer does. "Perhaps the yahoo target site senses I'm different from the hosting site in some way"--and I don't understand why that would be. Thus this overall post. – shrimpwidget Feb 06 '20 at 16:55
  • HTML::Tiny? About HTML::Tiny - "Lightweight, dependency free HTML/XML generation" You think I'm trying to *generate HTML*? – shrimpwidget Feb 06 '20 at 17:02
  • YOUR SUGGESTION OF HTML TINY PRODUCES THE SAME ERROR: Could not connect to 'finance.yahoo.com:80': nodename nor servname provided, or not known – shrimpwidget Feb 06 '20 at 17:10
  • @shrimpwidget Grinnz already pointed out that I use [HTTP::Tiny](https://perldoc.pl/HTTP::Tiny) and you refer to [HTML::Tiny](https://metacpan.org/pod/HTML::Tiny) -- they are completely different. It is a good indicator that you need take a brake to rest. – Polar Bear Feb 06 '20 at 18:40
  • HTTP::Tiny, then. Copied/pasted your code. Didn't work. It's break, not brake. I think the most promising lead is that the hosting site has a firewall and that I need to deal with THAT somehow in my perl script. – shrimpwidget Feb 07 '20 at 00:50
  • Thank you for your help, Polar Bear. I like the tip "web hosting is taking place on Windows machine" which you were able to find (from looking at the perl module, I suppose? cool!) – shrimpwidget Feb 12 '20 at 22:13
  • @shrimpwidget -- It was not that difficult `'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0'`. I assume that OP would put **credentials** of his system -- I use similar approach to make perl script look like Windows machine. – Polar Bear Feb 12 '20 at 23:42
  • Is that what this part does in the OP code? Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0 – shrimpwidget Feb 17 '20 at 04:40
  • @shrimpwidget -- Please see the following [documentation](https://en.wikipedia.org/wiki/User_agent), [GeeksForGeeks](https://www.geeksforgeeks.org/http-headers-user-agent/), [Google Chrome](https://developer.chrome.com/multidevice/user-agent), [HowToGeek](https://www.howtogeek.com/114937/htg-explains-whats-a-browser-user-agent/). – Polar Bear Feb 17 '20 at 07:11
0

Perhaps it's about cookies or about using yahoo's "query" url instead.

Yahoo Finance URL not working

shrimpwidget
  • 23
  • 1
  • 8