Iterate over pairwise combinations of column names and row indices in pandas

Question

If I have the following pandas DataFrame :

>>> df

  x y z

x 1 3 0

y 0 5 0

z 0 3 4

I want to iterate over the pairwise combinations of column names and row indices to perform certain operation. For example, for the pair of x and y, replace the 3 with 'xy'. The desired output will look like:

>>> df

   x  y z

x xx xy xz

y xy yy yz

z xz yz zz

a naïve code that I tried and doesn't work is:

for i, j in range(0,2):
    df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]

Check out my answer and this link : `df.set_value()` is far and away faster, link to why: https://stackoverflow.com/questions/13842088/set-value-for-particular-cell-in-pandas-dataframe/24517695#24517695 — Vikash Singh, Jul 31 '17 at 14:00

Scott Boston · Answer 1 · 2017-12-06T02:51:18.620

10

How about a simple one-liner, using Pandas DataFrame elements:

df.apply(lambda x: x.index+x.name)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

Update: Using numpy.ufunc.outer method.

pd.DataFrame(np.add.outer(df.index, df.columns), index=df.index, columns=df.columns)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

edited Dec 06 '17 at 02:51

answered Jul 31 '17 at 13:52

Scott Boston

147,308
15
139
187

This is clever ! – MaxU - stand with Ukraine Jul 31 '17 at 14:03
Really nice solution +1 – Bharath M Shetty Jul 31 '17 at 14:03
getting this error: `TypeError: ('can only perform ops with scalar values', 'occurred at index x')` any idea why? – Vikash Singh Jul 31 '17 at 14:05
TypeError happens because the dtype of the column index and index are not the same and so. To fix this error you need to cast the dtype to the same dtype. if you want string addition or integer addition. Use astype. – Scott Boston Jun 25 '18 at 12:58

Vikash Singh · Answer 2 · 2017-08-02T05:55:51.780

2

df.set_value() is way faster, link to why: Set value for particular cell in pandas DataFrame

import pandas as pd

data = [{'x': 1, 'y': 2, 'z': 3}, {'x': 4, 'y': 5, 'z': 6}, {'x': 7, 'y': 8, 'z': 9}]

df = pd.DataFrame.from_dict(data, orient='columns')

df = df.astype(str)

df

#       x   y   z
#    0  1   2   3
#    1  4   5   6
#    2  7   8   9


for idx, row in df.iterrows():
    for column in list(df.columns.values):
        val = str(idx) + str(column)
        df.set_value(idx, column, val)

df

output:

    x   y   z
0   0x  0y  0z
1   1x  1y  1z
2   2x  2y  2z

Note: set_value won't work if column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem.

If you don't care about column names you can prepone it with column #

df.columns = [str(idx) + '_' + name for idx, name in enumerate(df.columns)]

edited Aug 02 '17 at 05:55

answered Jul 31 '17 at 13:48

Vikash Singh

13,213
8
40
70

do you know how to implement it when I have duplicated column names? – mallet Aug 01 '17 at 22:54
`set_value` won't work in column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem. – Vikash Singh Aug 02 '17 at 05:51
@VikashSingh I was revisiting some old posts and this is your solution for the TypeError using my answer. `df.apply(lambda x: x.index.astype(str)+x.name)` index and column need to be same dtype and have an add method for that dtype. – Scott Boston Jun 25 '18 at 12:53

score 1 · Answer 3 · answered Jul 31 '17 at 13:54

This should be really fast:

import numpy as np

grid = np.meshgrid(df.columns.values.astype(str),
                   df.index.values.astype(str))
result = np.core.defchararray.add(*grid)

You can then assign result to either the same dataframe or another one.

score 0 · Accepted Answer · answered Jul 31 '17 at 13:44

0

Is this what you are looking for?

>>> df
   x  y  z
x  1  3  0
y  0  5  0
z  0  3  4

>>> for i in range(3):
...     for j in range(3):
...         df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
...
>>> df
    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

answered Jul 31 '17 at 13:44

Constructor

494
4
11

That's what I was looking for. Couldn't get me head around it. – mallet Jul 31 '17 at 13:47

score 0 · Answer 5 · answered Jul 31 '17 at 13:46

for i, col in enumerate(df.columns):
    print(df[i][col] + df[col][i])


df = pd.DataFrame(df[i][col] + df[col][i] for i, col in enumerate(df.columns))

This way you can iterate over all the columns and paired rows dynamically without needing to know how many columns there are.

Iterate over pairwise combinations of column names and row indices in pandas

5 Answers5

Update: Using numpy.ufunc.outer method.