2

Ultimately I have to perform matrix operations to my dataset.

What I am trying to figure out is how to get piled data (specially from SQL) and transform it to a matrix ? (It can be a 2d numpy array).

Suppose my SQL query returns something of the sort:

Name Data         Value 
a    2016-05-01   5
b    2016-05-01   7
c    2016-05-01   7
a    2016-06-01   4
b    2016-06-01   3
c    2016-06-01   2
d    2016-06-01   2

The output I'm looking for is

Data/Name    a   b   c   d
2016-05-01   5   7   7   NaN
2016-06-01   4   3   2   2

I know I could loop through all values, but what I'm looking for here is performance. Is there a more direct/fast/pythonic way?

Pedro Braz
  • 2,261
  • 3
  • 25
  • 48

1 Answers1

1

Use the function pivot against your DataFrame.

For a given df:

df
Out[19]: 
  Name        Data  Value
0    a  2016-05-01      5
1    b  2016-05-01      7
2    c  2016-05-01      7
3    a  2016-06-01      4
4    b  2016-06-01      3
5    c  2016-06-01      2
6    d  2016-06-01      2

Call pivot by setting the index and colums you want to use from your original DataFrame:

df.pivot('Data', 'Name')
Out[20]: 
           Value               
Name           a    b    c    d
Data                           
2016-05-01   5.0  7.0  7.0  NaN
2016-06-01   4.0  3.0  2.0  2.0
Zeugma
  • 31,231
  • 9
  • 69
  • 81