2

I have a list of tuples(y) that I wish to convert to a DataFrame x. There are five tuples in y. Each tuple in y has 33 elements. Element 1 in all 5 tuples is text and is the same. Element two in all five tuples is text and is the same. Element three in each tuple is text and is the same.

I'd like to first three elements in y to be the column names in the DataFrame. I want to convert the list of tuples into a 10 x 3 DataFrame. The tricky part is row 1 in the dataframe would be elements 4,5,6 in y[1], row 2 in the dataframe would be elements 7,8,9 in y[1], row 3 would be 10,11,12...etc.

y looks like this (not showing the entire list) :

List of tuples y                
y[0]    y[1]    y[2]    y[3]    y[4]

Formula Formula Formula Formula Formula
Phase   Phase   Phase   Phase   Phase
Value   Value   Value   Value   Value
"a" "a" "a" "a" "a"
"nxxx"  "nxxx"  "nxxx"  "nxxx"  "nxxx"
3.2 3.7 22.4    18.2    9.7
"h45"   "h45"   "h45"   "h45"   "h45"
"cacpp" "cacpp" "cacpp" "cacpp" "cacpp"
45.2    61.76   101.2   171.89  203.7
"trx"   "trx"   "trx"   "trx"   "trx"
"v2o5p" "v2o5p" "v2o5p" "v2o5p" "v2o5p"
0.24    0.81    0.97    1.2 1.98
"blnt"  "blnt"  "blnt"  "blnt"  "blnt"
"g2o3"  "g2o3"  "g2o3"  "g2o3"  "g2o3"
807.2   905.8   10089   10345   10979

I want to convert y into DataFrame x as follows:

DataFrame x     
column 1 column 2 column 3

Formula Phase   Value
"a" "nxxx"  3.2
"h45"   "cacpp" 45.2
"trx"   "v2o5p" 0.24
"blnt"  "g2o3"  807.2
"a" "nxxx"  3.7
"h45"   "cacpp" 61.76
"trx"   "v2o5p" 0.81
"blnt"  "g2o3"  905.8
"a" "nxxx"  22.4
"h45"   "cacpp" 101.2
"trx"   "v2o5p" 0.97
"blnt"  "g2o3"  10089
etc etc etc

I know there must be an easy way to iterate through the list of tuples. But new to Pandas and relatively new to Python so I'm struggling with a clean way to do this.

Anonymous
  • 11,748
  • 6
  • 35
  • 57
user3720101
  • 1,365
  • 2
  • 14
  • 18

2 Answers2

1

Basically, you need: 1) remove first 3 element of each tuple (just need one as column header) 2) concatenate all elements in y 3) reshape to 3 columns All these can be achieved with numpy which you must be familiar if you are using pandas

#Step 1) and 2) above.
In [83]: data = np.concatenate ([z[3:] for z in y])

#reshape
In [84]: data = data.reshape(-1, 3)

#Now data is a numpy array which looks what you need:
In [85]: data
Out[85]: 
array([['a', 'nxxx', '3.2'],
       ['h45', 'cacpp', '45.2'],
       ['trx', 'v2o5p', '0.24'],
       ['blnt', 'g2o3', '807.2'],
       ['a', 'nxxx', '3.7'],
       ['h45', 'cacpp', '61.76'],
       ['trx', 'v2o5p', '0.81'],
       ['blnt', 'g2o3', '905.8'],
       ['a', 'nxxx', '22.4'],
       ['h45', 'cacpp', '101.2'],
       ['trx', 'v2o5p', '0.97'],
       ['blnt', 'g2o3', '10089'],
       ['a', 'nxxx', '18.2'],
       ['h45', 'cacpp', '171.89'],
       ['trx', 'v2o5p', '1.2'],
       ['blnt', 'g2o3', '10345'],
       ['a', 'nxxx', '9.7'],
       ['h45', 'cacpp', '203.7'],
       ['trx', 'v2o5p', '1.98'],
       ['blnt', 'g2o3', '10979']], 
      dtype='|S6')

You can put data into a pandas DataFrame

In [86]: df = pd.DataFrame (data, columns=y[0][:3])

In [87]: df
Out[87]: 
   Formula  Phase   Value
0        a   nxxx     3.2
1      h45  cacpp    45.2
2      trx  v2o5p    0.24
3     blnt   g2o3   807.2
4        a   nxxx     3.7
5      h45  cacpp   61.76
6      trx  v2o5p    0.81
7     blnt   g2o3   905.8
8        a   nxxx    22.4
9      h45  cacpp   101.2
10     trx  v2o5p    0.97
11    blnt   g2o3   10089
12       a   nxxx    18.2
13     h45  cacpp  171.89
14     trx  v2o5p     1.2
15    blnt   g2o3   10345
16       a   nxxx     9.7
17     h45  cacpp   203.7
18     trx  v2o5p    1.98
19    blnt   g2o3   10979
Happy001
  • 6,103
  • 2
  • 23
  • 16
  • This looks really great but still getting an error.... def phs_tab(y): data = np.concatenate ([z[3:] for z in y]) data = data.reshape(-1, 3) df = pd.DataFrame (data, columns=y[0][:3]) print df phs_tab(y) data = data.reshape(-1, 3) ValueError: total size of new array must be unchanged – user3720101 Jun 12 '14 at 02:47
  • try `for z in y: print len(z)` to check if all y's have expected length? – Happy001 Jun 12 '14 at 02:54
  • This step gave me the error: data = data.reshape(-1, 3) – user3720101 Jun 12 '14 at 02:57
0

Assuming some dummy data:

In [122]: y1 = ('Formula', 'Phase', 'Value', 1, 2, 3, 4, 5, 6)
In [123]: y2 = ('Formula', 'Phase', 'Value', 7, 8, 9, 10, 11, 12)
In [124]: y = [y1, y2]

And using this 'grouper' recipe from this answer to iterate by groups.

In [125]: from itertools import izip_longest

In [126]: def grouper(iterable, n, fillvalue=None):
     ...:     args = [iter(iterable)] * n
     ...:     return izip_longest(*args, fillvalue=fillvalue)

Then you could do something like this? The grouper(y_tuple[3:], 3) iterates over the tuple in groups of 3, excluding the first 3 elements.

In [127]: columns = y[0][:3]

In [128]: data = []
     ...: for y_tuple in y:
     ...:     for group_of_3 in grouper(y_tuple[3:], 3):
     ...:         data.append(list(group_of_3))
     ...:         

In [129]: data
Out[129]: [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [130]: pd.DataFrame(data=data, columns=columns)
Out[130]: 
   Formula  Phase  Value
0        1      2      3
1        4      5      6
2        7      8      9
3       10     11     12
Community
  • 1
  • 1
chrisb
  • 49,833
  • 8
  • 70
  • 70