Construct pandas DataFrame from list of tuples of (row,col,values)

Question

I have a list of tuples like

data = [
('r1', 'c1', avg11, stdev11),
('r1', 'c2', avg12, stdev12),
('r2', 'c1', avg21, stdev21),
('r2', 'c2', avg22, stdev22)
]

and I would like to put them into a pandas DataFrame with rows named by the first column and columns named by the 2nd column. It seems the way to take care of the row names is something like pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data]) but how do I take care of the columns to get a 2x2 matrix (the output from the previous set is 3x4)? Is there a more intelligent way of taking care of row labels as well, instead of explicitly omitting them?

EDIT It seems I will need 2 DataFrames - one for averages and one for standard deviations, is that correct? Or can I store a list of values in each "cell"?

Definite duplicate of < http://stackoverflow.com/questions/11415701/efficiently-construct-pandas-dataframe-from-large-list-of-tuples-rows?rq=1 > — ely, Nov 13 '13 at 18:29
@EMS not at all. I saw that question, he did not need the 2D pivoting. — gt6989b, Nov 13 '13 at 18:33
@ely, gt6989b I retitled this "...from list of tuples of (row,col,values)" to make it clear why this is **not a duplicate** of "...from tuples" — smci, Feb 16 '18 at 03:13
duplicate of https://stackoverflow.com/questions/28200157/list-of-tuples-to-dataframe-conversion — doubts, Sep 12 '18 at 15:29

score 70 · Accepted Answer · answered Nov 13 '13 at 18:28

70

You can pivot your DataFrame after creating:

>>> df = pd.DataFrame(data)
>>> df.pivot(index=0, columns=1, values=2)
# avg DataFrame
1      c1     c2
0               
r1  avg11  avg12
r2  avg21  avg22
>>> df.pivot(index=0, columns=1, values=3)
# stdev DataFrame
1        c1       c2
0                   
r1  stdev11  stdev12
r2  stdev21  stdev22

answered Nov 13 '13 at 18:28

Roman Pekar

107,110
28
195
197

3

The row with index `0` and column with name `1` aren't very beautiful... – drevicko Aug 12 '16 at 14:56

score 51 · Answer 2 · answered Nov 13 '13 at 18:44

51

I submit that it is better to leave your data stacked as it is:

df = pandas.DataFrame(data, columns=['R_Number', 'C_Number', 'Avg', 'Std'])

# Possibly also this if these can always be the indexes:
# df = df.set_index(['R_Number', 'C_Number'])

Then it's a bit more intuitive to say

df.set_index(['R_Number', 'C_Number']).Avg.unstack(level=1)

This way it is implicit that you're seeking to reshape the averages, or the standard deviations. Whereas, just using pivot, it's purely based on column convention as to what semantic entity it is that you are reshaping.

answered Nov 13 '13 at 18:44

ely

74,674
34
147
228

1

+1, useful explicitness. I am explicitly interested in a 2D table -- to allow me to search for values, indexed by both row- and column- lists, as well as access each of the dimensions separately. Can you do that with stacked data? – gt6989b Nov 13 '13 at 18:50
2

Yep. Much better with stacked data. Think of a relational database table, like in SQL. You don't go blowing out a whole column into a bunch of repeated columns do you? That should only happen in special cases (I think it is the tall-to-wide pattern). Normally, you treat multiple columns as indexes, and make selections by partially binding one of the index columns, or binding them all to get a specific record. – ely Nov 13 '13 at 18:51
2

So, in your case, after setting the index to be `[R_Number, C_Number]` you can do `df.ix[('r1','c2')]`, for example. Or you can leave those both as regular columns and use logical indexing: `df[(df.R_Number == 'r1') & (df.C_Number == 'c2')]` – ely Nov 13 '13 at 18:53

score 37 · Answer 3 · answered Nov 29 '17 at 07:21

37

This is what I expected to see when I came to this question:

#!/usr/bin/env python

import pandas as pd


df = pd.DataFrame([(1, 2, 3, 4),
                   (5, 6, 7, 8),
                   (9, 0, 1, 2),
                   (3, 4, 5, 6)],
                  columns=list('abcd'),
                  index=['India', 'France', 'England', 'Germany'])
print(df)

gives

         a  b  c  d
India    1  2  3  4
France   5  6  7  8
England  9  0  1  2
Germany  3  4  5  6

answered Nov 29 '17 at 07:21

Martin Thoma

124,992
159
614
958

You misread the question. The data originally given in the question already has row and column labels for each record in question. – gt6989b Dec 05 '17 at 15:53
11

@gt6989b No, I didn't. I didn't try to answer the original question, but the question people (might) have when they come to this page. – Martin Thoma Dec 05 '17 at 19:58
8

@MartinThoma Thanks, indeed this is what I am looking for on this page. – ssword Apr 12 '18 at 19:19
Indeed I was expecting this too – cikatomo May 10 '21 at 21:33

Construct pandas DataFrame from list of tuples of (row,col,values)

3 Answers3

Linked

Related