0

Context: I have been doing some data transformation to a data frame and I currently have it formated like this:

            sample       REF   ALT   #
1                a         A    G   1.0
2                a         C    T   4.0
3                a       GGG  AAC   1.0
5                b         A    G   1.0
6                b         C    T   4.0

and I'd like to pivot this data frame to have the data displayed this way:

1          REF       A    C    GGG    C
2          ALT       G    T    AAC    A
3           a        1    4     1     0
4           b        1    4     0     0

...

I was trying to use pivot

final_data = data_no_na.pivot(
columns=("sample", "REF", "ALT"), values="#").reset_index()

but it is not quite there yet.

How could I do this transformation?

Thank you in advance, any help / link to numpy or pandas documentation is very welcome as I'm fairly new to python.

  • Your output doesn't make a whole lot of sense. What's the logic behind the zeros in the desired output? Why are column *labels* becoming row *values* (REF and ALT)? – ddejohn Nov 05 '21 at 20:20
  • I'm trying to count the signatures/changes from a reference genome (REF) with a new variant (ALT) and some of those changes are not present in certain samples (hence the 0s). I'm using this input because I'd like to apply some regression later on (where the different signatures/changes have specific weights). The row values REF and ALT are simply for guidance, I'll most likely remove those later on (basically the positions [0,0] and [1,0] will be empty) –  Nov 05 '21 at 20:23
  • 1
    Slightly different than your output, but is `df.pivot_table(index="sample", columns=["REF", "ALT"], values="#", fill_value=0)`, what you're looking for? I don't get where your last column is coming from, you might have left out a row in your input – user3483203 Nov 05 '21 at 20:29

1 Answers1

0

You could play with the index and unstack:

(df.set_index(['sample', 'REF', 'ALT'])
   ['#']
   .unstack(['REF', 'ALT'], fill_value=0)
 )

Output:

REF       A    C  GGG
ALT       G    T  AAC
sample               
a       1.0  4.0  1.0
b       1.0  4.0  0.0
mozway
  • 194,879
  • 13
  • 39
  • 75