Can you specify columns for a calculation in a dataframe using values from another list/set?

Question

I have a dataframe with this format, generated by pivoting an excel file:

Gene             Ref      y       z
Sample                                 
1             29.2877  29.0625  20.9868
2             29.9897  32.8044  25.8783
3             31.6335  34.7172  24.6268

I want to perform a calculation with columns ref and y, then with ref and z, to generate 2 new columns. I know how to do these individually but I want something where I can specify how many columns need to be evaluated (y, z... etc), based on a number_of_columns = len(list) object generated earlier. Ideally, I'd also like the new columns to have names taken from another set that is also generated earlier. I always have the ref column as the first column, so I was trying something like this:

while number_of_columns != 0:
    for column in df[(number_of_columns + 2)].iteritems():
        df[set_of_names_of_new_columns] = df.(names_of_columns_to_use) + df[ref_column]
        number_of_columns -= 1

Obviously this doesn't work but I have put it in to show what I was thinking. This also isn't the calculation but just simplified here.

Any help very much appreciated!

also, please include: `names_of_columns_to_use` +`ref_column` in order for this to reproducible. — David Erickson, Oct 28 '20 at 23:28
Please provide a [mcve], as well as the current and expected output. See [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391). — AMC, Oct 29 '20 at 00:07

score 0 · Answer 1 · answered Oct 28 '20 at 23:31

If your df isn't too wide, a simple for-loop approach will work just fine as each column operation is vectorized. Just identify your reference row and columns of interest programatically (by name rather than index value):

ref = df["Ref"]
cols = [col for col in df.columns if col != "Ref"]

for col in cols:
    new_col = "output_{}".format(col)
    df[new_col] = df[col]*ref

result:

Gene          Ref        y        z     output_y    output_z
Sample
1         29.2877  29.0625  20.9868   851.173781  614.655102
2         29.9897  32.8044  25.8783   983.794115  776.082454
3         31.6335  34.7172  24.6268  1098.226546  779.031878

That is amazing! Thank you so much! I've spent the whole afternoon trying to do that! — JWS74, Oct 29 '20 at 02:13

Can you specify columns for a calculation in a dataframe using values from another list/set?

1 Answers1