3

I'm working on a script to download files and struggle a bit with the implementation of two variables in a function.

Imagine I have a data frame with two columns: url and index. I want to download the file for every url and save the file as the the index plus suffix (1.mov, 2.mov etc.).

import pandas as pd
import numpy as np
import os
import urllib.request
directory = 'videos/'

def download_multimedia(url, index):

    try:
        url = (url)
        filename = os.path.join(index + '.mov')

        # Download file
        fullpath = os.path.join(directory, filename)
        urllib.request.urlretrieve(url, fullpath)

    except:
        filename   = np.nan

    return filename

So I tried to pass the information from the two columns into a function that is embedded into a list.

downloads = [download_multimedia(url, index) for url, index in data.videourl, data.index]

However, this gives me:

ValueError: The truth value of a RangeIndex is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can the issue be solved, i.e. how do I handle the input information for each row correctly?

Thanks in advance!

Christopher
  • 2,120
  • 7
  • 31
  • 58
  • It may help to know what values in the data frame cause this error. Does it mainly happen with a valid URL or invalid URL? You could also try: for (url, index) in (data.videourl, dataindex): temp = download_multimedia, print(temp, type(temp)) ... to see where an error comes in. https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o seems to have a lot of information about this specific error, too. – aschultz Aug 05 '19 at 12:08
  • I think the core question is whether the list function is written correctly? There's no error when I try one of the columns alone. – Christopher Aug 05 '19 at 12:19
  • Also tried: 'for url, index in zip(data.videourl, data.index): download_multimedia(url, index)' – Christopher Aug 05 '19 at 12:29

1 Answers1

1

Guessing from the error , you might be using data.index somewhere in your code as boolean . To reproduce you can simply create a data frame and use it like this >

data  = pd.DataFrame(some_dictionary)
if data.index:
    print 1

executing above code will give you the error which you are getting. so you need to check for this kind of situation in your code and change it.

now assuming you have corrected this issue , i can see two other issue with your code

issue 1:

 filename = os.path.join(index + '.mov')

in above here type of index is int and concatenating it with string will give you an error which your code will bypass and store np.nan in filename variable

you can use below line instead

 filename = os.path.join(str(index) + '.mov')

issue 2:

downloads = [download_multimedia(url, index) for url, index in data.videourl, data.index]

in above list comprehension syntax is invalid ,correct syntax is

downloads = [download_multimedia(url, index) for url, index in zip( data.videourl, data.index)]

with issue1 and issue2 taken care of , i was able to download media. Hope this helps :)