Pipe the output of a Perl subprocess in Python to an array in Python

Question

I've got a Perl program. It's output looks something like this:

http://www.site.com/file1.html
http://www.site.com/file2.html
http://www.site.com/file3.html
.
.
.
v

I've got an unfinished Python program. Here it is:

import subprocess

pipe = subprocess.Popen(["perl", "perl_program.pl"])

I run the python program in my terminal like this:

whatever:~ whatever$ python python_program.py

I get the following:

http://www.site.com/file1.html
http://www.site.com/file2.html
http://www.site.com/file3.html
.
.
.
v

I want to pop these URLs into an array in my Python code and manipulate them within Python. How do I do that?

Here is the Perl program I am working with:

 1  use LWP::Simple;
 2  use HTML::TreeBuilder;
 3  use Data::Dumper;
 4   
 5  my $tree = url_to_tree( 'http://www.registrar.ucla.edu/schedule/schedulehome.aspx' );
 6   
 7  my @selects  = $tree->look_down( _tag => 'select' );
 8  my @quarters = map { $_->attr( 'value' ) } $selects[0]->look_down( _tag => 'option' );
 9  my @courses  = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $selects[1]->look_down( _tag => 'option' );
10   
11  my $n = 0;
12   
13  my %hash;
14   
15  for my $quarter ( @quarters )
16  {
17      for my $course ( @courses )
18      {
19          my $tree_b = url_to_tree( "http://www.registrar.ucla.edu/schedule/crsredir.aspx?termsel=$quarter&subareasel=$course" );
20         
21          my @options = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $tree_b->look_down( _tag => 'option' );
22         
23          for my $option ( @options )
24          {
25           
26           
27              print "trying: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option\n";
28             
29              my $content = get( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
30             
31              next if $content =~ m/No classes are scheduled for this subject area this quarter/;
32             
33              $hash{"$course-$option"} = 1;
34              #my $tree_c = url_to_tree( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
35             
36              #my $table = ($tree_c->look_down( _tag => 'table' ))[2]->as_HTML;
37             
38              #print "$table\n\n\n\n\n\n\n\n\n\n";
39             
40              $n++;
41          }
42      }
43  }
44   
45  my $hash_count = keys %hash;
46  print "$n, $hash_count\n";
47   
48  sub url_to_tree
49  {
50      my $url = shift;
51     
52      my $content = get( $url );
53   
54      my $tree = HTML::TreeBuilder->new_from_content( $content );
55     
56      return $tree;
57  }

How are the URLs being generated? Would it be possible to port that code from perl to python or port the rest of the code from python to perl? — hd1, Mar 06 '14 at 05:30
@hd1, yeah, I don't know, I don't even know what you're saying really, but I think if you explain it a bit more I might follow... — user3333975, Mar 06 '14 at 05:33

icedtrees · Answer 1 · 2014-03-06T05:42:12.727

3

Try this:

pipe = subprocess.Popen(["perl", "perl_program.pl"], stdout = subprocess.PIPE)
urls, stderr = pipe.communicate()
urls = urls.split("\n")

# urls is the array that you can now manipulate

Alternatively with Python 2.7 or higher you can use check_output

urls = subprocess.check_output(["perl", "perl_program.pl"]).split("\n")

If you use strip with no arguments, as suggested by @J.F.Sebastian, the url splitting would be even better, as superfluous newlines and whitespace are also stripped from the resulting list.

urls = urls.split()

edited Mar 06 '14 at 05:42

answered Mar 06 '14 at 05:25

icedtrees

6,134
5
25
35

you could use `split()` (no argument): it removes newlines (including `\r\n`, like `.splitlines(keepends=False)`) and it trims other whitespace (there can't be whitespace inside url so it behaves like `.strip()`) – jfs Mar 06 '14 at 05:34
I'm going to try your approach icedtrees. BTW, what is it with people on stackoverflow having arctic themed names? Like chill something, ice something. Is ice in vogue or something? – user3333975 Mar 06 '14 at 05:36
Wait, my URL list is about ten thousand lines... It takes a while to process... How will I know that I've got them in Python? How can I show the processing and spit out the first element in urls to the console, so I know I've got what I want? – user3333975 Mar 06 '14 at 05:42
2

@user3333975: see [read subprocess stdout line by line](http://stackoverflow.com/q/2804543/4279) – jfs Mar 06 '14 at 05:47
OK, I'll do that now. I'm running your code now without it and printing out the first element in urls to see if I've got it. – user3333975 Mar 06 '14 at 05:52
No problem. If this answer meets your requirements, you should mark it as your answer. (Additional note: all the cool people have ice in their name) – icedtrees Mar 07 '14 at 07:22

Pipe the output of a Perl subprocess in Python to an array in Python

1 Answers1