I have a dataframe that contains X & Y data in columns like this:
df_cols = ['x1', 'y1', 'x2', 'y2', 'x3', 'y3']
np.random.seed(365)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 6)), columns=df_cols)
x1 y1 x2 y2 x3 y3
0 2 4 1 5 2 2
1 9 8 4 0 3 3
2 7 7 7 0 8 4
3 3 2 6 2 6 8
4 9 6 1 6 5 7
5 7 6 5 9 3 8
6 7 9 9 0 1 4
7 0 9 6 5 6 9
8 5 3 2 7 9 2
9 6 6 3 7 7 1
I need to call a function that takes one X & Y pair at a time and returns and updated X & Y pair (same length), and then either save that data to a new dataframe with the original column names, or replace the old X & Y data with the new data and keep the original column names.
For example, take this function below:
def samplefunc(x, y):
x = x*y
y = x/10
return x, y
# Apply function to each x & y pair
x1, y1 = samplefunc(df.x1, df.y1)
x2, y2 = samplefunc(df.x2, df.y2)
x3, y3 = samplefunc(df.x3, df.y3)
# Save new/updated x & y pairs into new dataframe, preserving the original column names
df_updated = pd.DataFrame({'x1': x1, 'y1': y1, 'x2': x2, 'y2': y2, 'x3': x3, 'y3': y3})
# Desired result:
In [36]: df_updated
Out[36]:
x1 y1 x2 y2 x3 y3
0 8 0.8 5 0.5 4 0.4
1 72 7.2 0 0.0 9 0.9
2 49 4.9 0 0.0 32 3.2
3 6 0.6 12 1.2 48 4.8
4 54 5.4 6 0.6 35 3.5
5 42 4.2 45 4.5 24 2.4
6 63 6.3 0 0.0 4 0.4
7 0 0.0 30 3.0 54 5.4
8 15 1.5 14 1.4 18 1.8
9 36 3.6 21 2.1 7 0.7
But doing it this way is obviously really tedious and impossible for a huge dataset. The similar/related questions I've found perform a simple transformation on the data rather than calling a function, or they add new columns to the dataframe instead of replacing the originals.
I tried to apply @PaulH's answer to my dataset, but neither of them are working as it is unclear how to actually call the function inside of either method.
# Method 1
array = np.array(my_actual_df)
df_cols = my_actual_df.columns
dist = 0.04 # a parameter I need for my function
df = (
pandas.DataFrame(array, columns=df_cols)
.rename_axis(index='idx', columns='label')
.stack()
.to_frame('value')
.reset_index()
.assign(value=lambda df: numpy.select(
[df['label'].str.startswith('x'), df['label'].str.startswith('y')],
# Call the function (not working):
[df['value'], df['value']] = samplefunc(df['value'], df['value']),
))
.pivot(index='idx', columns='label', values='value')
.loc[:, df_cols]
)
# Method 2
df = (
pandas.DataFrame(array, columns=df_cols)
.pipe(lambda df: df.set_axis(df.columns.map(lambda c: (c[0], c[1])), axis='columns'))
.rename_axis(columns=['which', 'group'])
.stack(level='group')
# Call the function (not working)
.assign(df['x'], df['y'] = samplefunc(df['x'], df['y']))
.unstack(level='group')
.pipe(lambda df: df.set_axis([''.join(c) for c in df.columns], axis='columns'))
)
The actual function I need to call is from Arty's answer to this question: Resample trajectory to have equal euclidean distance in each sample