0

I am pulling Nintendo DS prices from this website using lynx -dump.

For example, let's say I am going to pull from the webpage for the game Yoshi Touch and Go:

/usr/bin/lynx -dump -width=150 http://videogames.pricecharting.com/game/nintendo-ds/Yoshi-Touch-and-Go

Everything works fine and I can use Regex to pull the prices easily. The problem comes from when the URL contain's an apostrophe (') or an ampersand(&) as that brings up an error. So let's say I try and find the page for the game Yoshi's Island DS, I would use this line of code:

/usr/bin/lynx -dump -width=150 http://videogames.pricecharting.com/game/nintendo-ds/Yoshi's-Island-DS

which would give me these little errors:

sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file

Here is the code I use to call the -dump with $fullURL being the string containing: "http://videogames.pricecharting.com/game/nintendo-ds/Yoshi's-Island-DS".

$command     = "/usr/bin/lynx -dump -width=150 $fullURL";
@pageFile = `$command`;

Could anyone help me find a solution that will turn the $fullURL string into a URL compatible string?

Nick
  • 25
  • 2
  • 2
  • 6
  • 1
    You many want to look at [LWP](http://p3rl.org/LWP) and [LWP::Simple](http://p3rl.org/LWP::Simple) for better ways to get the contents of a web page than using the shell to call `lynx`. – Ven'Tatsu Apr 24 '12 at 19:31
  • I will definitely keep that in mind. I was making a DS price checker program for a final project in my perl class and we had done an assignment earlier this semester that used the same method of using lynx to dump the contents of a page. That's why I kept the same method for this project as well. I just finished it all up and it works well, albeit not too efficient and takes a while to process all the games. Thanks for the idea though! :) – Nick Apr 24 '12 at 20:30

2 Answers2

3

You need to escape the ' in your URL before it is passed to the shell. Perl provides to quotemeta function to perform the needed escapes for most shells.

my $quoted_URL = quotemeta($fullURL);
$command     = "/usr/bin/lynx -dump -width=150 $quoted_URL";
...

You can also use the \Q and \E escapes in the string for the same result.

$command     = "/usr/bin/lynx -dump -width=150 \Q$fullURL\E";
...
Ven'Tatsu
  • 3,565
  • 16
  • 18
1

The correct way to deal with this problem is to avoid the shell by using the list form of system/pipe open (replacement for qx/backtick operator), see Perl equivalent of PHP's escapeshellarg.

use autodie qw(:all);
open my $lynx, '-|', qw(/usr/bin/lynx -dump -width=150), $fullURL;
my @pageFile = <$lynx>;
close $lynx;

In the rare cases where this is not practical, proper shell quoting is provided through String::ShellQuote and Win32::ShellQuote.

Community
  • 1
  • 1
daxim
  • 39,270
  • 4
  • 65
  • 132
  • just out of curiosity, why is using the shell so bad to pull from a URL? – Nick Apr 25 '12 at 00:52
  • The question must be, why should you prefer avoiding the shell, instead passing parameters to the execve system call without further interpretation? It's more efficient: you save one process per program launch. It's more secure: you eliminate the whole class of shell injection bugs. It's more robust: characters such as `'` or `&` need no special treatment. – daxim Apr 25 '12 at 06:54