0

i've got a dataframe with three columns. Each row needs to be copied and altered twice based on the values in that specific row and column. However, the values in the other columns need to stay the same.

I've managed to make the dataframe, as follows:

df = pd.DataFrame({'Value': list(range(3)), 'Value2': list(range(3)), 'Value3':['A','B','C']})

idx = df['Value'].index

# construct dataframe to append
df_extra1 = df.loc[idx].copy()
df_extra2 = df.loc[idx].copy()
df_extra3 = df.loc[idx].copy()
df_extra4 = df.loc[idx].copy()


# add 3 seconds
df_extra1['Value'] = df_extra1['Value'] + 0.1
df_extra2['Value'] = df_extra2['Value'] - 0.1
df_extra3['Value2'] = df_extra3['Value2'] + 0.1
df_extra4['Value2'] = df_extra4['Value2'] - 0.1

# append to original
res1 = df.append(df_extra1)
res2 = res1.append(df_extra2)
res3 = res2.append(df_extra3)
res4 = res3.append(df_extra4)

This is what the result is and should look like:

   Value  Value2 Value3
0    0.0     0.0      A
1    1.0     1.0      B
2    2.0     2.0      C
0    0.1     0.0      A
1    1.1     1.0      B
2    2.1     2.0      C
0   -0.1     0.0      A
1    0.9     1.0      B
2    1.9     2.0      C
0    0.0     0.1      A
1    1.0     1.1      B
2    2.0     2.1      C
0    0.0    -0.1      A
1    1.0     0.9      B
2    2.0     1.9      C 

Is there anyway to speed this up or make it more concise?

Floris
  • 13
  • 3
  • 1
    Directly accessing individual cells in Pandas will be glacial, just due to performance issues due to fancy indexing. Merely exporting the column to a numpy array, and then providing an operation on each row in that array, and then re-assigning the array will be orders of magnitude faster. – Alex Huszagh Aug 23 '19 at 21:49
  • If you can apply uniform logic to the function (depends only on values), you can use `df.apply` which should be quite performant, but that does not look to be the case here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html – Alex Huszagh Aug 23 '19 at 21:51
  • For future readers: see cs95's answer (https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas), also (https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6). – angrymantis Aug 23 '19 at 22:46

1 Answers1

1

It's not entirely clear what you're trying to do, but based on the example you provide you could simplify this by iterating over the product of the columns you're trying to update and the updates you're trying to apply:

import pandas as pd
from itertools import product

df = pd.DataFrame({'Value': list(range(3)), 'Value2': list(range(3)), 'Value3':['A','B','C']})

to_alter = ['Value', 'Value2']
constants = [0.1, -0.1]

dfs = [df, ]
for col, const in product(to_alter, constants):
    t = df.copy()
    t[col] += const
    dfs.append(t)

result = pd.concat(dfs)

By appending you're copying your dataframe repeatedly, which is not ideal, especially since you're already creating copies at the start.

dan_g
  • 2,712
  • 5
  • 25
  • 44