I'm working on a dataframe where it's important to keep the order. I would like to split it into chunks that I process afterwards.
The splitting is done based on the 3rd column type
, all contiguous records with the same value of columns type (or any given categorical column) should be in one chunk, and if possible i want to done it in a pythonic way.
but I can only think of solutions where I iterate through the df. Consider that this will have to work on dataframes with tens of thousands of entries roughly, and I have no idea of the fastest strategy to do so. Here is a small example of what I have:
value_1 value_2 type
0 -0.005842 -0.494596 a
1 0.697689 0.354717 a
2 -0.354206 -1.776550 a
3 2.154078 0.344629 a
4 1.072475 1.004945 a
5 -1.338075 0.175607 b
6 -1.913883 -0.123627 b
7 -0.021376 -0.170775 b
8 -0.274882 -0.043913 b
9 0.676371 -0.691243 b
10 0.440201 -0.577944 c
11 -0.689345 -0.445433 b
12 1.540386 -1.084499 c
13 0.236204 -0.072807 b
14 -0.257084 0.848501 c
15 0.681666 -0.265254 b
16 -1.168614 -0.359998 c
17 0.355938 1.529444 b
18 0.292976 -0.301847 c
19 0.670068 0.735191 b
20 0.551594 -0.074768 a
21 -1.251568 -0.022201 a
22 0.376663 -1.556191 a
23 -0.266714 0.860436 d
24 -0.871324 1.014529 d
25 1.504529 -0.657725 d
And here is how I would like to split it
value_1 value_2 type
0 1.411723 -0.836490 a
1 0.482826 1.625925 a
2 -0.054475 2.046166 a
3 0.020816 0.155194 a
4 0.840539 0.287658 a
value_1 value_2 type
5 0.257208 -2.311165 b
6 -1.545194 -0.193307 b
7 0.197849 -1.276644 b
8 0.074072 -0.172764 b
9 -2.562816 0.393645 b
value_1 value_2 type
10 0.258265 -0.978293 c
value_1 value_2 type
11 -0.804841 -0.78802 b
value_1 value_2 type
12 -0.509034 1.116428 c
value_1 value_2 type
13 -0.264252 1.025199 b
value_1 value_2 type
14 -0.268105 -0.795613 c
value_1 value_2 type
15 0.481051 0.184827 b
value_1 value_2 type
16 1.242139 0.401806 c
value_1 value_2 type
17 1.301684 0.281108 b
value_1 value_2 type
18 0.189178 0.894425 c
value_1 value_2 type
19 -0.093207 0.894564 b
value_1 value_2 type
20 -2.231735 0.250696 a
21 -0.276050 -0.712792 a
22 0.298974 -0.529791 a
value_1 value_2 type
23 0.115159 2.769695 d
24 0.636069 -1.066387 d
25 1.048230 1.500125 d
Something like a groupby that gives back just a list of slices according to the value of the chosen column would be perfect, I haven't found any existing function like that