2

If I have the following pandas DataFrame :

>>> df

  x y z

x 1 3 0

y 0 5 0

z 0 3 4

I want to iterate over the pairwise combinations of column names and row indices to perform certain operation. For example, for the pair of x and y, replace the 3 with 'xy'. The desired output will look like:

>>> df

   x  y z

x xx xy xz

y xy yy yz

z xz yz zz

a naïve code that I tried and doesn't work is:

for i, j in range(0,2):
    df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
mallet
  • 2,454
  • 3
  • 37
  • 64
  • 1
    Check out my answer and this link : `df.set_value()` is far and away faster, link to why: https://stackoverflow.com/questions/13842088/set-value-for-particular-cell-in-pandas-dataframe/24517695#24517695 – Vikash Singh Jul 31 '17 at 14:00

5 Answers5

10

How about a simple one-liner, using Pandas DataFrame elements:

df.apply(lambda x: x.index+x.name)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

Update: Using numpy.ufunc.outer method.

pd.DataFrame(np.add.outer(df.index, df.columns), index=df.index, columns=df.columns)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

df.set_value() is way faster, link to why: Set value for particular cell in pandas DataFrame

import pandas as pd

data = [{'x': 1, 'y': 2, 'z': 3}, {'x': 4, 'y': 5, 'z': 6}, {'x': 7, 'y': 8, 'z': 9}]

df = pd.DataFrame.from_dict(data, orient='columns')

df = df.astype(str)

df

#       x   y   z
#    0  1   2   3
#    1  4   5   6
#    2  7   8   9


for idx, row in df.iterrows():
    for column in list(df.columns.values):
        val = str(idx) + str(column)
        df.set_value(idx, column, val)

df

output:

    x   y   z
0   0x  0y  0z
1   1x  1y  1z
2   2x  2y  2z

Note: set_value won't work if column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem.

If you don't care about column names you can prepone it with column #

df.columns = [str(idx) + '_' + name for idx, name in enumerate(df.columns)]
Vikash Singh
  • 13,213
  • 8
  • 40
  • 70
  • do you know how to implement it when I have duplicated column names? – mallet Aug 01 '17 at 22:54
  • `set_value` won't work in column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem. – Vikash Singh Aug 02 '17 at 05:51
  • @VikashSingh I was revisiting some old posts and this is your solution for the TypeError using my answer. `df.apply(lambda x: x.index.astype(str)+x.name)` index and column need to be same dtype and have an add method for that dtype. – Scott Boston Jun 25 '18 at 12:53
1

This should be really fast:

import numpy as np

grid = np.meshgrid(df.columns.values.astype(str),
                   df.index.values.astype(str))
result = np.core.defchararray.add(*grid)

You can then assign result to either the same dataframe or another one.

jdehesa
  • 58,456
  • 7
  • 77
  • 121
0

Is this what you are looking for?

>>> df
   x  y  z
x  1  3  0
y  0  5  0
z  0  3  4

>>> for i in range(3):
...     for j in range(3):
...         df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
...
>>> df
    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz
Constructor
  • 494
  • 4
  • 11
0
for i, col in enumerate(df.columns):
    print(df[i][col] + df[col][i])


df = pd.DataFrame(df[i][col] + df[col][i] for i, col in enumerate(df.columns))

This way you can iterate over all the columns and paired rows dynamically without needing to know how many columns there are.

Cory Madden
  • 5,026
  • 24
  • 37