-1

I am trying to convert data from following Table into another summarized data.

Sample Position Reference Alternate No. of Reads
P6 10017884 C G 2
P7 10017884 C G 1
P8 10017884 C A 7
P98 10017884 C T,A 4
P114 10020433 A C 1
P98 10020433 A C 4

I am looking for output like following table. In some of the samples, there are data which are missing/empty, so it should be filled with 0 (zero). Any type of hint to solve this problem will be helpful in Linux/R or Python.

Position Reference Aletrnate P1 P2 P3 P4 P5 P6 P7 P8 P98 P114
10017884 C G 0 0 0 0 0 2 1 0 0 0
A 0 0 0 0 0 0 0 7 0 0
T,A 0 0 0 0 0 0 0 0 4 0
10020433 A C 0 0 0 0 0 0 0 0 4 1
neilfws
  • 32,751
  • 5
  • 50
  • 63

2 Answers2

0
import pandas as pd
df = pd.read_csv('test.csv')
df.pivot(['Position', 'Reference', 'Alternate'], 'Sample', 'No. of Reads').fillna(0)
locriacyber
  • 100
  • 4
0

Using R tidyr::pivot_wider, assuming data in dataframe df1:

library(tidyr)
df1 %>% 
  pivot_wider(names_from = "Sample", 
              values_from = "No. of Reads", 
              values_fill = 0)

Result:

# A tibble: 4 x 8
  Position Reference Alternate    P6    P7    P8   P98  P114
     <int> <chr>     <chr>     <int> <int> <int> <int> <int>
1 10017884 C         G             2     1     0     0     0
2 10017884 C         A             0     0     7     0     0
3 10017884 C         T,A           0     0     0     4     0
4 10020433 A         C             0     0     0     4     1

Data:

df1 <- read.table(text = "Sample    Position    Reference   Alternate   'No. of Reads'
P6  10017884    C   G   2
P7  10017884    C   G   1
P8  10017884    C   A   7
P98 10017884    C   T,A 4
P114    10020433    A   C   1
P98 10020433    A   C   4", header = TRUE, check.names = FALSE)
neilfws
  • 32,751
  • 5
  • 50
  • 63