-2

I have a function named tmp that only returns two strings. In addition, I have 2 iterables that I want to pass to the tmp function, one of them has 88000 lengths and another one has 50 lengths. I want to change the second one on every 200 iterates, but the problem is I can not iterate over the second iterable. Here is what I've done so far.

Code:

from itertools import repeat


url_list = [] # contains over 80000 urls 
files = [] # contains 50 files

def tmp(url, file):
    return url, file
    
# I want to use the file for only 200 URLs and then change it and use the next one in the list(files) provided
list(map(tmp, url_list, map(lambda x: repeat(x, 200), files)))

Expected output:

url1, file1
url2, file1
url3, file1
.
.
url201, file2
url202, file2
.
.
.
url401 file3
url402 file3
.
.

Any help would be highly appreciated.

Xus
  • 198
  • 13
  • Can you please show a small example of the sequence you want with iterables containing say 6 and 2 strings, respectively? – Mad Physicist Jun 28 '22 at 18:10
  • 1
    What output are you expecting (again, easier with a small example)? This is why constructing a [mcve] is so important. – Mad Physicist Jun 28 '22 at 18:11
  • Also, what is the problem with the code you have? – Mad Physicist Jun 28 '22 at 18:11
  • @MadPhysicist thanks for your response, I will edit the question and add an example. – Xus Jun 28 '22 at 18:12
  • `50 * 200` is only 10000. How will you use that with 88000 in the first iterable. Does it need to go back to the beginning of the second iterable? – Barmar Jun 28 '22 at 18:13
  • That's not the problem @Barmar I will increase the number of files or just like you said I'll go back to the beginning of the second iterable. – Xus Jun 28 '22 at 18:16
  • I know it's not the problem you were asking about, I'm just asking for clarification of what you expect it to do. You'll need to use `cycle` to go back to the beginning. – Barmar Jun 28 '22 at 18:19
  • 1
    `repeat` returns a list of lists. Try flattening that. `list(map(tmp, url_list, [f for ff in repeat(files, 200) for f in ff]))` Or `list(map(tmp, url_list, itertools.chain.from_iterable(repeat(files, 200))))` – 001 Jun 28 '22 at 18:22
  • I think for now I go with the second approach which means going back to the beginning of the second iterable. – Xus Jun 28 '22 at 18:22

2 Answers2

1

Rather than repeating files 200 times, split url_list into chunks of 200. See How do I split a list into equally-sized chunks? for various ways to code this.

Use itertools.cycle() to go back to the beginning of files when you reach the end.

result = []

for url_chunk, file in zip(chunks(url_list, 200), itertools.cycle(files)):
    result.extend([url, file for url in url_chunk])
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

You can try for example this:

import pprint

url_list = ["url" + str(i+1) for i in range(20)]
file_list = ["file" + str(i+1) for i in range(5)]
every_n = 3

result = [ (url_list[i], file_list[min(i // every_n, len(file_list)-1)])
           for i in range(len(url_list)) ]

pprint.pprint(result)

The output of above script:

[('url1', 'file1'),
 ('url2', 'file1'),
 ('url3', 'file1'),
 ('url4', 'file2'),
 ('url5', 'file2'),
 ('url6', 'file2'),
 ('url7', 'file3'),
 ('url8', 'file3'),
 ('url9', 'file3'),
 ('url10', 'file4'),
 ('url11', 'file4'),
 ('url12', 'file4'),
 ('url13', 'file5'),
 ('url14', 'file5'),
 ('url15', 'file5'),
 ('url16', 'file5'),
 ('url17', 'file5'),
 ('url18', 'file5'),
 ('url19', 'file5'),
 ('url20', 'file5')]

Remark:

  • // integer division 5 // 2 = 2