How to create a 3D list of 2D lists that vary in size(The 2D lists vary in size) in Python?

Question

I have my dataframe like this. You can see that the column called filename indicates that this row is in a file and another row is in a different file or not. I have also created another column to count the total number of rows are in a file.

I have extracted the ymin and ymax by concatenating all of them to a list and the result is a 2D list:

y = [[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38],[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39],...]

Thus, this only puts all the coordinates into a list without knowing which belongs to the first file and which belongs to the second

My approach is making a 3D list like this:

y = [[[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38]],[[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39]],...]

You can see from [4,43] to [1,38] that they are in the same file. You can also see from [2,40] to [2,39] that they are also in the same file.

Here is my current attempt

def get_y_coordinate(count):
    """
    Create a 3D list of y_coordinates that can distinguish which list is in a file which list belongs to another file
    :param: count - the list taken from the column "count" from the dataframe
    """
    c = 2 # Number of chunks to make
    fi_y= lambda y, c: [y[i:i+c] for i in range(0, len(y), c)] # Making y into chunks of 2 ex: from [4,43,9,47] to [4,43],[9,74]
    y = fi_y(y,c) # Now y is [[4, 43], [9, 47],...]]

    # This is my current approach, I create a new list called bigy. 

    bigy = []   
    current = 0 
    for i in count:
        if current != i:
            bigy.append([y[j] for j in range(current, i+current)])
        current = i
        
    return bigy
>> bigy = [[[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38]],[[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39]],...]

I achieved the result for a first few hundreds of files. However, up until around file 700, it doesn't work anymore. I need another approach for this problem if anyone can be patient enough to read to here and help me out. Thank you very much!

conceptually you want an array of min/max tuples for each "file" right? Also, when you say it does not work past about 700, do you get an error or is it just too slow? — JonSG, Jun 07 '21 at 12:59
Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) — JonSG, Jun 07 '21 at 13:44

JonSG · Answer 1 · 2021-06-07T13:44:20.940

I think my first inclination would be to just iterate over the dataframe and collect results in a defaultdict. Maybe something like:

import collections
import pandas

mock_data = pandas.DataFrame([
    {"Name": "product_name", "ymin": 4, "ymax": 43},
    {"Name": "product_name", "ymin": 9, "ymax": 47},
    {"Name": "product_total_money", "ymin": 76, "ymax": 122},
    {"Name": "vat", "ymin": 30, "ymax": 74},
    {"Name": "product_name", "ymin": 10, "ymax": 47}
])

y_results = collections.defaultdict(list)
for _, row in mock_data.iterrows():
    y_results[row["Name"]].append((row["ymin"], row["ymax"]))

print(y_results)

Alternately, you might also try:

mock_data.groupby('Name').agg(lambda x: list(x))

How to create a 3D list of 2D lists that vary in size(The 2D lists vary in size) in Python?

1 Answers1