2

I am using Perl with WWW::Mechanize to download an MP3 file which is served in chunks of 400KB (around 20 seconds).

When I save the data with binmode on the file handle, appending each chunk as it arrives, only the first chunk is played correctly; the rest is not.

When I don't use binmode I can't play the whole file -- it plays but sounds interesting!

This is my program

use WWW::Mechanize;

$agent = WWW::Mechanize->new( cookie_jar => {} );

@links = ("http://thehost.com/chunk1","http://thehost.com/chunk2","http://thehost.com/chunk3");

foreach (@links){
    $agent->get($_);

    my $filename = 'test.mp3';
    open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
    binmode $fh;
    print $fh $agent->content;
    close $fh;
}

What am I doing wrong?

Update

These are the HTTP headers that are being returned.

Cache-Control: public
Connection: close
Date: Tue, 28 Oct 2014 18:38:37 GMT
Pragma:
Server: Apache
Content-Length: 409600
Content-Type: application/octet-stream
Expires: Sat, 24 Oct 2015 12:08:00 GMT
Access-Control-Allow-Origin: *
Client-Date: Tue, 28 Oct 2014 18:38:28 GMT
Client-Peer: **.**.***.***:80
Client-Response-Num: 1
Borodin
  • 126,100
  • 9
  • 70
  • 144
communications
  • 145
  • 1
  • 10
  • I've tried to make sense of your question and understand what exactly is happening, but your English isn't clear. The problem is with *"When I don't use binmode I can't play the whole file"* -- you seem to be saying that, with or without `binmode`, only the first chunk plays correctly. Is that right? – Borodin Oct 29 '14 at 01:24
  • The first file only plays with binmode – communications Nov 08 '14 at 12:37

3 Answers3

1

I suspect the content is served with incorrect headers, and as you are using the API that automatically decodes, this corrupts the octet stream.

Use the mirror method instead and concatenate the files after downloading.

daxim
  • 39,270
  • 4
  • 65
  • 132
1

I doubt that a single mp3 file is just split after some number of bytes and then these chunks are offered as a separate downloads. Instead I assume that these are each separate mp3 files which contain 20 seconds of the original file and each of the URLs contains a correct mp3 file. Because mp3 is just not data but header and data you cannot simple merge these mp3 files by just concatenating them together. Instead you must you a program like ffmpeg to create a single mp3 file from multiple mp3 files, see https://superuser.com/questions/314239/how-to-join-merge-many-mp3-files

Community
  • 1
  • 1
Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • The chunks 2+ dont have headers, i think the file is just splitted. The last chunk contains the file ending like every mp3 (ªªªªªªªª etc) – communications Oct 28 '14 at 18:08
  • In that case it would be interesting what the server really returns. Especially any content-encodings in the header. Did you try to get the data with wget and then join them with `cat` - because that is what you Perl program is essentially doing. Maybe the data are corrupt? – Steffen Ullrich Oct 28 '14 at 19:07
  • Gibt es eine Möglichkeit sie dierekt zu kontaktieren? Ich möchte hier keine Dateien verteilen, da ich keine eventuellen Urheberechtsprobleme am Hals haben möchte! Ich habe auf ihrer Website leider keine Email-Adresse gefunden... – communications Oct 28 '14 at 19:26
  • For non-German speakers, that translates as *"Is there a way to contact them directly? I would like to distribute files because I did not want any possible copyright problems in the neck! Unfortunately I have found no email address on their website."* @communications: I'm afraid we really can't help you any further without knowing the real URL. Is that possible, please? – Borodin Oct 28 '14 at 19:55
  • You can contact me at sullr AT cpan DOT org. – Steffen Ullrich Oct 28 '14 at 20:52
  • @ SteffenUllrich: Thank you, i've send you the files! Thank you for your help! @Borodin: I could mail them you too, if you are interessted you would do me a great favor, if you would be able to help me out! – communications Oct 28 '14 at 21:32
1

I can't explain the behaviour that you're getting, but WWW::Mechanize is intended for working with HTML text pages, and isn't that good with binary data. Using the LWP::UserAgent module directly isn't at all hard.

I suggest you use something like this instead.

use strict;
use warnings;
use 5.010;
use autodie;

use LWP;

my @links = qw(
  http://thehost.com/chunk1
  http://thehost.com/chunk2
  http://thehost.com/chunk3
);

my $agent = LWP::UserAgent->new;

my $filename = 'test.mp3';
open my $fh, '>:raw', $filename;

for my $link (@links) {
    my $resp = $agent->get($link);
    die $resp->status_line unless $resp->is_success;
    print $fh $resp->decoded_content;
}

close $fh;

If you still have problems then please add a line like this

print $resp->headers_as_string, "\n\n";

right after the get call, and report back with the results you get.

You may also get some results by using the content method instead of decoded_content.

Of course it may help us a lot if you could give out the real URLs, but I realise that you may not be able to do that.

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • None of your ideas worked... Heres the headers_as_string result: http://pastebin.com/T6K6H04j Does This help? – communications Oct 28 '14 at 18:42
  • And are the headers the same for all three URLs? (Apart from the date-times, obviously.) Also, please would you change the last line of the loop to `my $content = $resp->decoded_content; print length($content), "\n"; print $fh $content;` and say what result you get. – Borodin Oct 28 '14 at 20:03
  • @communications: Everything looks fine there. May I know the *form* or the URLs? Perhaps the whole URLs with the host name removed? As for the files you are downloading, I suggest you put them on [MediaFire](https://www.mediafire.com/) or similar and let me have a link. It is looking increasingly as though either the URLs you are using are wrong, or the data just isn't what you expect it to be. It would be good to know whether the second and subsequent chunks play correctly on their own? – Borodin Oct 28 '14 at 23:31