0

I have a dataframe (df) as follows:

import pandas as pd

pd.DataFrame(
    {
        "Query Strain": {0: "YAL001C", 1: "YAL001C", 2: "YAL001C", 3: "YAL001C", 4: "YAL001C"},
        "Array Strain ID": {
            0: "YBL026W_tsa1065",
            1: "YBR123C_tsa568",
            2: "YBR160W_tsa150",
            3: "YBR167C_tsa1074",
            4: "YDL017W_tsa880",
        },
        "Genetic interaction score (ε)": {
            0: -0.3529,
            1: -0.5317,
            2: -0.2163,
            3: -0.5053,
            4: -0.2082,
        },
    }
)
Query Strain  Array Strain ID  Genetic interaction score (ε)
YAL001C       YBL026W_tsa1065                        -0.3529
YAL001C        YBR123C_tsa568                        -0.5317
YAL001C        YBR160W_tsa150                        -0.2163
YAL001C       YBR167C_tsa1074                        -0.5053
YAL001C        YDL017W_tsa880                        -0.2082

What I want to do is:

  1. Convert Array Strain ID values as column headers
  2. Then link Query Strain with Array Strain ID values by Genetic interaction score (ε) values

So the desired output df will look like:

Query Strain  YBL026W_tsa1065  YBR123C_tsa568  YBR160W_tsa150  YBR167C_tsa1074  YDL017W_tsa880
YAL001C               -0.3529         -0.5317         -0.2163          -0.5053         -0.2082
irahorecka
  • 1,447
  • 8
  • 25

1 Answers1

2

Try with pivot:

output = (df.pivot("Query Strain", 
                   "Array Strain ID", 
                   "Genetic interaction score (ε)")
            .rename_axis(None, axis=1))

>>> output
              YBL026W_tsa1065  YBR123C_tsa568  ...  YBR167C_tsa1074  YDL017W_tsa880
Query Strain                                   ...                                 
YAL001C               -0.3529         -0.5317  ...          -0.5053         -0.2082

[1 rows x 5 columns]
not_speshal
  • 22,093
  • 2
  • 15
  • 30