I am rather new to Python, and VERY new to Pandas (I am having a more difficult time learning Pandas than Python).
I am trying to transform a large dataset, and I am stuck.
I upload data from a CSV that has the following structure.
DATE ID CATEGORY 1 SUCCESS (0 or 1) 1/1/2015 a1 x 0 1/1/2015 a2 y 0 1/1/2015 a3 z 0 1/3/2015 a2 z 0 1/5/2015 a1 x 0 1/7/2015 a2 z 1 1/9/2015 a3 y 0 1/10/2015 a2 z 1 1/11/2015 a3 y 0
My end goal is find a way to group this into form where I can get the series of categories leading up to a success flag for a specific ID, then an array of the time elapsed during from the previous row the same ID.
So a result would something like:
{a2: {'1':((y,z,z),(0,2,4)), '2':((z),(0))}
I am not sure if Pandas' or NumPy's multidimensional arrays would be better suited for the task. I am also not sure what functions to play around with more in Pandas to accomplish this.
A point in the right direction would be greatly helpful.