88

I have a list of tuples like

data = [
('r1', 'c1', avg11, stdev11),
('r1', 'c2', avg12, stdev12),
('r2', 'c1', avg21, stdev21),
('r2', 'c2', avg22, stdev22)
]

and I would like to put them into a pandas DataFrame with rows named by the first column and columns named by the 2nd column. It seems the way to take care of the row names is something like pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data]) but how do I take care of the columns to get a 2x2 matrix (the output from the previous set is 3x4)? Is there a more intelligent way of taking care of row labels as well, instead of explicitly omitting them?

EDIT It seems I will need 2 DataFrames - one for averages and one for standard deviations, is that correct? Or can I store a list of values in each "cell"?

smci
  • 32,567
  • 20
  • 113
  • 146
gt6989b
  • 4,125
  • 8
  • 46
  • 64
  • 1
    Definite duplicate of < http://stackoverflow.com/questions/11415701/efficiently-construct-pandas-dataframe-from-large-list-of-tuples-rows?rq=1 > – ely Nov 13 '13 at 18:29
  • 2
    @EMS not at all. I saw that question, he did not need the 2D pivoting. – gt6989b Nov 13 '13 at 18:33
  • 1
    @ely, gt6989b I retitled this "...from list of tuples of (row,col,values)" to make it clear why this is **not a duplicate** of "...from tuples" – smci Feb 16 '18 at 03:13
  • duplicate of https://stackoverflow.com/questions/28200157/list-of-tuples-to-dataframe-conversion – doubts Sep 12 '18 at 15:29

3 Answers3

70

You can pivot your DataFrame after creating:

>>> df = pd.DataFrame(data)
>>> df.pivot(index=0, columns=1, values=2)
# avg DataFrame
1      c1     c2
0               
r1  avg11  avg12
r2  avg21  avg22
>>> df.pivot(index=0, columns=1, values=3)
# stdev DataFrame
1        c1       c2
0                   
r1  stdev11  stdev12
r2  stdev21  stdev22
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
51

I submit that it is better to leave your data stacked as it is:

df = pandas.DataFrame(data, columns=['R_Number', 'C_Number', 'Avg', 'Std'])

# Possibly also this if these can always be the indexes:
# df = df.set_index(['R_Number', 'C_Number'])

Then it's a bit more intuitive to say

df.set_index(['R_Number', 'C_Number']).Avg.unstack(level=1)

This way it is implicit that you're seeking to reshape the averages, or the standard deviations. Whereas, just using pivot, it's purely based on column convention as to what semantic entity it is that you are reshaping.

ely
  • 74,674
  • 34
  • 147
  • 228
  • 1
    +1, useful explicitness. I am explicitly interested in a 2D table -- to allow me to search for values, indexed by both row- and column- lists, as well as access each of the dimensions separately. Can you do that with stacked data? – gt6989b Nov 13 '13 at 18:50
  • 2
    Yep. Much better with stacked data. Think of a relational database table, like in SQL. You don't go blowing out a whole column into a bunch of repeated columns do you? That should only happen in special cases (I think it is the tall-to-wide pattern). Normally, you treat multiple columns as indexes, and make selections by partially binding one of the index columns, or binding them all to get a specific record. – ely Nov 13 '13 at 18:51
  • 2
    So, in your case, after setting the index to be `[R_Number, C_Number]` you can do `df.ix[('r1','c2')]`, for example. Or you can leave those both as regular columns and use logical indexing: `df[(df.R_Number == 'r1') & (df.C_Number == 'c2')]` – ely Nov 13 '13 at 18:53
37

This is what I expected to see when I came to this question:

#!/usr/bin/env python

import pandas as pd


df = pd.DataFrame([(1, 2, 3, 4),
                   (5, 6, 7, 8),
                   (9, 0, 1, 2),
                   (3, 4, 5, 6)],
                  columns=list('abcd'),
                  index=['India', 'France', 'England', 'Germany'])
print(df)

gives

         a  b  c  d
India    1  2  3  4
France   5  6  7  8
England  9  0  1  2
Germany  3  4  5  6
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958