0

Following reproducible script is intended to process each two rows in a dataframe with length 5. For each two processed rows, I like to print a list of items that have been processed.

import pandas as pd
import itertools

my_dict = { 
'name' : ['a', 'b', 'c', 'd', 'e'],
'age' : [10, 20, 30, 40, 50]
}

df = pd.DataFrame(my_dict)

for index, row in itertools.islice(df.iterrows(), 2):
    rowlist = (row.name)
    print('Processed two rows {}'.format(rowlist))

Output:

a
b

I'm looking for a way to get following desired output:

[a,b]
[c,d]
[e]

Tried:

print(df.groupby(df.index//2)['name'].agg(list))

Out:

0    [a, b]
1    [c, d]
2       [e]
Name: name, dtype: object
0    [a, b]
1    [c, d]
2       [e]
Name: name, dtype: object

Thanks for your help!

Christopher
  • 2,120
  • 7
  • 31
  • 58
  • 1
    Use `df.groupby(df.index//2)['name'].agg(list)` – anky Aug 11 '19 at 11:20
  • It prints the output three times? – Christopher Aug 11 '19 at 11:23
  • No. If you are using this under a loop, you will have to check the loop, but this works without a loop(individually) – anky Aug 11 '19 at 11:24
  • 1
    anky's answer is superior over a loop, why would you want to use a loop here? You have a specific reason? @Christopher – Erfan Aug 11 '19 at 11:26
  • I am closing this since it is a duplicate, let me know if.... – anky Aug 11 '19 at 11:27
  • Well, my true list is super long and I have to apply multithreading to each x items. The pool.map() only accepts a list of files to process. I checked the duplicate link, but it relates to another question? – Christopher Aug 11 '19 at 11:27
  • 1
    Try `df.groupby(df.index//2)['name'].agg(list).tolist()` or `df.groupby(df.index//2)['name'].agg(list).to_numpy()` which gives you an array of lists. – Erfan Aug 11 '19 at 11:29
  • ok.. reopened..@Christopher , please edit the question with more details – anky Aug 11 '19 at 11:30
  • I tried the code above, but see what the output is. It prints twice? – Christopher Aug 11 '19 at 11:37
  • Okay, got closer: When using .agg(list).tolist(), it looks like: [['0_movie.mp4', '1_movie.mp4', '3_movie.mp4']] ... However, the desired output is a list in the form of ['0_movie.mp4', '1_movie.mp4', '3_movie.mp4']. There's one bracket too much? – Christopher Aug 11 '19 at 12:01
  • 1
    access the first element then.. `.agg(list).tolist()[0]`. These are some really basic python functionality's. – Erfan Aug 11 '19 at 12:49
  • Thanks for your help. I found I could not implement the code I aspired with the script above, but a chunker function from this thread here helped me: [link](https://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks) – Christopher Aug 11 '19 at 15:45

0 Answers0