0

I have a dataframe with this format, generated by pivoting an excel file:

Gene             Ref      y       z
Sample                                 
1             29.2877  29.0625  20.9868
2             29.9897  32.8044  25.8783
3             31.6335  34.7172  24.6268

I want to perform a calculation with columns ref and y, then with ref and z, to generate 2 new columns. I know how to do these individually but I want something where I can specify how many columns need to be evaluated (y, z... etc), based on a number_of_columns = len(list) object generated earlier. Ideally, I'd also like the new columns to have names taken from another set that is also generated earlier. I always have the ref column as the first column, so I was trying something like this:

while number_of_columns != 0:
    for column in df[(number_of_columns + 2)].iteritems():
        df[set_of_names_of_new_columns] = df.(names_of_columns_to_use) + df[ref_column]
        number_of_columns -= 1

Obviously this doesn't work but I have put it in to show what I was thinking. This also isn't the calculation but just simplified here.

Any help very much appreciated!

JWS74
  • 1

1 Answers1

0

If your df isn't too wide, a simple for-loop approach will work just fine as each column operation is vectorized. Just identify your reference row and columns of interest programatically (by name rather than index value):

ref = df["Ref"]
cols = [col for col in df.columns if col != "Ref"]

for col in cols:
    new_col = "output_{}".format(col)
    df[new_col] = df[col]*ref

result:

Gene          Ref        y        z     output_y    output_z
Sample
1         29.2877  29.0625  20.9868   851.173781  614.655102
2         29.9897  32.8044  25.8783   983.794115  776.082454
3         31.6335  34.7172  24.6268  1098.226546  779.031878
anon01
  • 10,618
  • 8
  • 35
  • 58