This may sound like a strange question at first, but I found it hard to find "standard" terms when talking about elements of data of a long format
. So I thought I'd just as well use the same terms as Hadley Wickham uses in one of the first examples in his article on Tidy Data:
In a sample of my real world data, row contains dates, column contains categories, and value contains prices like this:
Input
row column value
0 21.08.2020 A 43
1 21.08.2020 A 36
2 21.08.2020 B 36
3 21.08.2020 C 28
4 22.08.2020 A 16
5 22.08.2020 B 40
6 22.08.2020 B 34
Here, the column
values are not as regular as the image above. Some column values are missing for some row values. How can I include those column names in the same dataset with value set to 0
? In the sample dataframe above, column C
only occurs for row = 21.08.2020
:
Is there a pandas function that can take this into consideration and include 22.08.2020 C 0
?
Desired output
row column value
0 21.08.2020 A 43
1 21.08.2020 A 36
2 21.08.2020 B 36
3 21.08.2020 C 28
4 22.08.2020 A 16
5 22.08.2020 B 40
6 22.08.2020 B 34
7 22.08.2020 C 0
I've tried an approach with retrieving all unique column values = ['A', 'B', 'C']
, and then looping through all row values and inserting the columns missing with value = 0
, but that turned into a real mess really fast. So any other suggestions would be great!
Edit: From long to wide using pd.pivot
Using pd.pivot_table(df1,index='row',columns='column',values='value')
will turn the Input dataframe above into:
column A B C
row
21.08.2020 39.5 36.0 28.0
22.08.2020 16.0 37.0 NaN
Here, NaN
is included by default for column=C
and row=22.08.2020
. So the case now remains to melt or pivot this dataframe into the desired output without dropping the NaN
.
Edit 2: sample dataframe
import pandas as pd
df=pd.DataFrame({'row': {0: '21.08.2020',
1: '21.08.2020',
2: '21.08.2020',
3: '21.08.2020',
4: '22.08.2020',
5: '22.08.2020',
6: '22.08.2020'},
'column': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'A', 5: 'B', 6: 'B'},
'value': {0: 43, 1: 36, 2: 36, 3: 28, 4: 16, 5: 40, 6: 34}})