1

I got a DataFrame looks like this (call it df1):

id    date   value

A1    day1    0.1
A1    day2    0.2
A1    day3   -0.1
A2    day1    0.3
A3    day2    0.2
A3    day4   -0.5

I need to convert the value to a matrix for calculation, so I think I need to transform the DataFrame to this form (call it df2) first (and then convert to a numpy array):

      day1  day2  day3  day4  day5
  A1   0.1   0.2  -0.1   0.0   0.0
  A2   0.3   0.0   0.0   0.0   0.0
  A3   0.0   0.2   0.0  -0.5   0.0 

if an id don't have value on a day, just set that day's value to 0 (and probably none of the ids have a full-date value).

What I think is to generate an empty DataFrame (call it df3) first and then fill df1's data in it:

      day1  day2  day3  day4  day5
  A1   0.0   0.0   0.0   0.0   0.0
  A2   0.0   0.0   0.0   0.0   0.0
  A3   0.0   0.0   0.0   0.0   0.0

But I don't know the proper way to iterate df1's value to match the cell in df3 (And people say it's a bad idea to iterate a dataframe ?), or is there a better approach (like pivot or merge)?

cottontail
  • 10,268
  • 18
  • 50
  • 51
chapayev
  • 15
  • 1
  • 4
  • Also [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – Rodalm Jun 16 '22 at 03:26
  • The date series is generated by another specific function, so in some dates (like day5, day7.. day6 not exits!) there is no data of all those ids (but an id's date is always a subset of date series ), and I need to set these date's value to 0, so I think I still need to merge with df3 (after transformed by pivot) – chapayev Jun 16 '22 at 03:27
  • 1
    @notarobot I tend to agree with you the old duplicate target did not apply. I've updated the duplicate targets list to a list of targets that can be modified to handle this kind of reshaping. I've also put together an [ideone](https://ideone.com/fvBz05) of how the various answers can be modified to match the output DataFrame shape. I know none of the targets end with a numpy array, but the reshaping is the harder part of the problem. It's trivial to [Convert pandas dataframe to NumPy array](/q/13187778/15497888) once the dataframe is in the correct shape. – Henry Ecker Jul 21 '22 at 23:21

1 Answers1

0

This should work.

# pivot and reindex to add the missing days
df.pivot(*df).reindex(['day1', 'day2', 'day3', 'day4', 'day5'], axis=1).fillna(0).values

# array([[ 0.1,  0.2, -0.1,  0. ,  0. ],
#        [ 0.3,  0. ,  0. ,  0. ,  0. ],
#        [ 0. ,  0.2,  0. , -0.5,  0. ]])
cottontail
  • 10,268
  • 18
  • 50
  • 51