1

I have two lists:

a = [1,2,3]
b = [4,5,6]

I want to create a dataframe whereby each combination of (a,b) generates a dataframe X and I pick out the max value of X, with the resulting output rows/columns with the elements in a and b.

df=[]

for i, j in itertools.product(a, b):
    X = do_something(i,j)  ## this is a dataframe
    x_value = X.max()
    df.append(i,j,x_value)

df=pd.DataFrame(df, columns=['a', 'b', 'x_value'])

The output dataframe should have columns as a, rows as b, and values as x_value.

    1   2   3
4           
5           
6           
user44840
  • 311
  • 2
  • 9
  • yes that's right -- updated it – user44840 Aug 06 '18 at 17:14
  • Does `func` take scalar `i` and `j`? So it has to be applied iteratively either before or after creating the dataframe? – hpaulj Aug 06 '18 at 17:25
  • No the function is very complicated, but the end result for each iteration (i, j) is a dataframe with many columns being produced. I then choose a value from a column – user44840 Aug 06 '18 at 17:26
  • There are two issues, 1) generating `x_value` for the cartesian product of `a` and `b`, and 2) arranging the values in a Dataframe with `a` and `b` columns and rows. Your code does 1) fine, but makes a different dataframe, one with 3 columns and 9 rows. But the data is all there. – hpaulj Aug 06 '18 at 22:19

4 Answers4

2

IIUC, you want to know how to go from a list of (i, j, x) values to a DataFrame where i corresponds to the columns, j the index, and x the value:

For example, if you had:

a = [1,2,3]
b = [4,5,6]
func = lambda i, j: i+j
result = [(i, j, func(i,j)) for i, j in itertools.product(a, b)]
print(result)
#[(1, 4, 5),
# (1, 5, 6),
# (1, 6, 7),
# (2, 4, 6),
# (2, 5, 7),
# (2, 6, 8),
# (3, 4, 7),
# (3, 5, 8),
# (3, 6, 9)]

One way to turn this into a DataFrame is to use collections.defaultdict:

from collections import defaultdict

d = defaultdict(list)

for i, j, x in result:
    d[i].append(x)

df = pd.DataFrame(d, index=b)
print(df)
#   1  2  3
#4  5  6  7
#5  6  7  8
#6  7  8  9
pault
  • 41,343
  • 15
  • 107
  • 149
2

IIUC

df=pd.DataFrame(columns=a,index=b)
df.apply(lambda x : x.index+x.name)
Out[189]: 
   1  2  3
4  5  6  7
5  6  7  8
6  7  8  9
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You can avoid the use of itertools.product while achieving the same functionality by using numpy and broadcasting:

a = [1,2,3]
b = [4,5,6]
arr = np.array(a).reshape(-1, 1) + np.array(b).reshape(1, -1)
df = pd.DataFrame(arr, columns=a, index=b)
PMende
  • 5,171
  • 2
  • 19
  • 26
0
In [134]: a=[1,2,3]
In [135]: b=[4,5,6]

Your list of 'indices' and values:

In [140]: alist = []
In [142]: for i,j in itertools.product(a,b):
     ...:     v = i*2 + j*.5
     ...:     alist.append([i,j,v])
     ...:     
In [143]: alist
Out[143]: 
[[1, 4, 4.0],
 [1, 5, 4.5],
 [1, 6, 5.0],
 [2, 4, 6.0],
 [2, 5, 6.5],
 [2, 6, 7.0],
 [3, 4, 8.0],
 [3, 5, 8.5],
 [3, 6, 9.0]]

A 3 column dataframe from that:

In [144]: df = pd.DataFrame(alist, columns=['a','b','value'])
In [145]: df
Out[145]: 
   a  b  value
0  1  4    4.0
1  1  5    4.5
2  1  6    5.0
3  2  4    6.0
4  2  5    6.5
5  2  6    7.0
6  3  4    8.0
7  3  5    8.5
8  3  6    9.0

One way of using the same data to make 'grid' dataframe:

In [147]: pd.DataFrame(np.array(alist)[:,2].reshape(3,3), columns=a, index=b)
Out[147]: 
     1    2    3
4  4.0  4.5  5.0
5  6.0  6.5  7.0
6  8.0  8.5  9.0

Oops that maps the rows and columns wrong; lets transpose the 3x3 array:

In [149]: pd.DataFrame(np.array(alist)[:,2].reshape(3,3).T, columns=a, index=b)
Out[149]: 
     1    2    3
4  4.0  6.0  8.0
5  4.5  6.5  8.5
6  5.0  7.0  9.0

I know numpy well; my experience with pandas is limited. I'm sure there are other ways of constructing such a frame. My guess is that if your value function is complex enough, the iteration mechanism will have a minor effect on the overall run time. Simply evaluating your function for each cell will take up most of the time.

If your function can be written to take arrays, rather than scalars, then the values can be easily calculated with out iteration. For example:

In [171]: I,J = np.meshgrid(b,a,indexing='ij')
In [172]: X = J*2 + I*.5
In [173]: X
Out[173]: 
array([[4. , 6. , 8. ],
       [4.5, 6.5, 8.5],
       [5. , 7. , 9. ]])
In [174]: I
Out[174]: 
array([[4, 4, 4],
       [5, 5, 5],
       [6, 6, 6]])
In [175]: J
Out[175]: 
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353