I have my dataframe like this. You can see that the column called filename indicates that this row is in a file and another row is in a different file or not. I have also created another column to count the total number of rows are in a file.
I have extracted the ymin and ymax by concatenating all of them to a list and the result is a 2D list:
y = [[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38],[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39],...]
Thus, this only puts all the coordinates into a list without knowing which belongs to the first file and which belongs to the second
My approach is making a 3D list like this:
y = [[[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38]],[[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39]],...]
You can see from [4,43] to [1,38]
that they are in the same file.
You can also see from [2,40] to [2,39]
that they are also in the same file.
Here is my current attempt
def get_y_coordinate(count):
"""
Create a 3D list of y_coordinates that can distinguish which list is in a file which list belongs to another file
:param: count - the list taken from the column "count" from the dataframe
"""
c = 2 # Number of chunks to make
fi_y= lambda y, c: [y[i:i+c] for i in range(0, len(y), c)] # Making y into chunks of 2 ex: from [4,43,9,47] to [4,43],[9,74]
y = fi_y(y,c) # Now y is [[4, 43], [9, 47],...]]
# This is my current approach, I create a new list called bigy.
bigy = []
current = 0
for i in count:
if current != i:
bigy.append([y[j] for j in range(current, i+current)])
current = i
return bigy
>> bigy = [[[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38]],[[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39]],...]
I achieved the result for a first few hundreds of files. However, up until around file 700, it doesn't work anymore. I need another approach for this problem if anyone can be patient enough to read to here and help me out. Thank you very much!