0

I am learning Pandas and have been facing difficulty understanding the pivot tables. Below is the sample program that I am running.

import pandas as pd

df = pd.read_csv('/Users/xxx/Desktop/df.csv')

print(df)

df = df.pivot_table(index='__timestamp', columns=[], values=['passed_count', 'failed_count'])

print(df)

And the programs prints below outputs-

   __timestamp failed_count  passed_count Unnamed: 3
0     27/05/18    0.019417       0.980583           
1     03/06/18    0.427136       0.839196           
2     10/06/18    0.839416       0.854015           
3     17/06/18    0.403846       0.913462           
4     24/06/18    1.429688       0.757812           
5     01/07/18    6.781457       0.701987           
6     08/07/18    0.324561       0.929825           
7     15/07/18    0.295082       0.970492           
8     22/07/18    0.849802       0.960474           
9     29/07/18    0.673333       0.923333           
10    05/08/18    0.276657       0.919308           
11    12/08/18    0.242105       0.821053           
12    19/08/18    0.176471       0.976471
       
             passed_count
__timestamp              
01/07/18         0.701987
03/06/18         0.839196
05/08/18         0.919308
08/07/18         0.929825
10/06/18         0.854015
12/08/18         0.821053
15/07/18         0.970492
17/06/18         0.913462
19/08/18         0.976471
22/07/18         0.960474
24/06/18         0.757812
27/05/18         0.980583
29/07/18         0.923333

I am not able to understand the absence of third column after doing the pivot_table(). Is it OK to give multiple values like I did above? What is the significance of the value option that is provided?

Edit:

As asked in the comments-

CSV file contents are-

__timestamp,failed_count,passed_count,
27/05/18,0.019417 ,0.980583, 
03/06/18,0.427136 ,0.839196, 
10/06/18,0.839416 ,0.854015, 
17/06/18,0.403846 ,0.913462, 
24/06/18,1.429688 ,0.757812, 
01/07/18,6.781457 ,0.701987, 
08/07/18,0.324561 ,0.929825, 
15/07/18,0.295082 ,0.970492, 
22/07/18,0.849802 ,0.960474, 
29/07/18,0.673333 ,0.923333, 
05/08/18,0.276657 ,0.919308, 
12/08/18,0.242105 ,0.821053, 
19/08/18,0.176471 ,0.976471,

Output of df.head(), immediately after reading the CSV is

      __timestamp failed_count  passed_count Unnamed: 3
0    27/05/18    0.019417       0.980583           
1    03/06/18    0.427136       0.839196           
2    10/06/18    0.839416       0.854015           
3    17/06/18    0.403846       0.913462           
4    24/06/18    1.429688       0.757812 
sgowd
  • 2,242
  • 22
  • 29
  • pivot_table should work as you are expecting. Can you give a sample of your csv file? – Sean Peters Aug 26 '18 at 10:26
  • Insufficient information. Please print out the head using *df.head()* as soon as you read your CSV data. Without knowing what the data you have read. We cannot comment on the pivot_table generated by you. – Raviteja Ainampudi Aug 26 '18 at 10:27
  • @SeanPeters Edited the question. – sgowd Aug 26 '18 at 10:39
  • Do you want [pivot](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) ? Or need unpivot like `df = df.melt(id_vars='__timestamp', value_vars=['failed_count','passed_count'])` by [`melt`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.melt.html) ? – jezrael Aug 26 '18 at 10:39
  • you show the output of `df.head` and not of `df.head()`. can you fix this ? – rocksportrocker Aug 26 '18 at 10:40
  • 1
    @rocksportrocker fixed. – sgowd Aug 26 '18 at 10:42
  • 1
    I suspect for some reason your second column is being interpreted as a string. Perhaps it is related to the trailing whitespace after each float. The identical code and csv work fine for me but, pandas does appear to silently ignore any string columns defined in the values list. Thus my suspicion... – Sean Peters Aug 26 '18 at 10:49
  • Try `type(df['failed_count'][0])` – Sean Peters Aug 26 '18 at 10:50
  • @SeanPeters, yes that was the case. I removed the spaces and it started printing all the 3 columns. – sgowd Aug 26 '18 at 11:08
  • Glad I could help. I've added the answer below for completeness. – Sean Peters Aug 26 '18 at 11:18
  • @sgowd, for something like this, it may be good to familiarise yourself with `to_numeric()`, `to_datetime()`, `astype()`. That's my understanding at least, maybe someone who knows more can give you better advice. – chillin Aug 26 '18 at 11:20

1 Answers1

2

As we discovered in the comments, pandas' pivot_table function will silently ignore any non-numeric (in this case str) columns in the values list. And the failed_count column was being interpreted as such.

Sean Peters
  • 337
  • 1
  • 8