0

I would like to iterate over a large list, where each list index is a point in time in the time series. For each row of the list there is an 2-dimensional array with start and end values for 10 user ID's (0-9) which are the same for each array.

I would then like to create a new data frame such that I have the row index (for each moment in time) and then a column for each of the 10 user ID's such that they have their own time series. It has taken me far too long to figure out this data type, but here is the reproducible sample version:

list1 = np.transpose(np.array([[1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9,10.0],[1.2,2.3,3.4,4.6,5.5,6.6,7.1,8.2,9.0,10.7]]))
list2 = np.transpose(np.array([[1.4,2.6,3.4,4.9,6.5,5.6,8.7,1.8,4.9,10.8],[1.4,2.4,3.5,4.7,6.5,5.6,7.5,8.4,9.3,10.2]]))
list3 = np.transpose(np.array([[4.1,5.2,5.5,6.1,6.5,5.9,7.7,8.8,9.9,10.0],[1.1,2.2,3.3,4.8,5.5,5.7,7.7,8.6,9.0,10.0]]))

list_of_arrays = [list1,list2,list3]

The output would be as follows:

    start1 end1 start2 end2 start3 end3 ... (etc) ... start10  end10
0    1.1   1.2   2.2   2.3   3.3    3.4 ...            10.0    10.7
1    1.4   1.4   2.6   2.4   3.4    3.5 ...            10.8    10.2
2    4.1   1.1   5.2   2.2   5.5    3.3 ...            10.0    10.0
... (large n rows)
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Aesler
  • 181
  • 10

2 Answers2

1

Looks like a simple reshape

pd.DataFrame([arr.reshape(-1) for arr in list_of_arrays])

    0    1    2    3    4    5    6   ...   13   14   15   16   17    18    19
0  1.1  1.2  2.2  2.3  3.3  3.4  4.4  ...  7.1  8.8  8.2  9.9  9.0  10.0  10.7
1  1.4  1.4  2.6  2.4  3.4  3.5  4.9  ...  7.5  1.8  8.4  4.9  9.3  10.8  10.2
2  4.1  1.1  5.2  2.2  5.5  3.3  6.1  ...  7.7  8.8  8.6  9.9  9.0  10.0  10.0

Then you can just add the column names as needed.

rafaelc
  • 57,686
  • 15
  • 58
  • 82
1

Another option with ravel:

pd.DataFrame([l.ravel() for l in list_of_arrays])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • `ravel`, `flatten`, `reshape(-1)` all would look the same here ;) – rafaelc Oct 07 '20 at 14:41
  • @rafaelc But they are [different inside](https://stackoverflow.com/questions/28930465/what-is-the-difference-between-flatten-and-ravel-functions-in-numpy). – Quang Hoang Oct 07 '20 at 14:42
  • Indeed - but since they all go into a list, which goes into the `DataFrame` constructor, these differences would not matter. We'd have copies in the end anyways! (but good catch) – rafaelc Oct 07 '20 at 14:44
  • Legend! This is why I hate programming sometimes as I spent a whole day creating sub data frames and merging them all back together and it still didn't give the desired result. Many thanks for those commands. – Aesler Oct 07 '20 at 15:33