Is there any way to summarize data by reformatting?

Question

I am trying to convert data from following Table into another summarized data.

Sample	Position	Reference	Alternate	No. of Reads
P6	10017884	C	G	2
P7	10017884	C	G	1
P8	10017884	C	A	7
P98	10017884	C	T,A	4
P114	10020433	A	C	1
P98	10020433	A	C	4

I am looking for output like following table. In some of the samples, there are data which are missing/empty, so it should be filled with 0 (zero). Any type of hint to solve this problem will be helpful in Linux/R or Python.

Position	Reference	Aletrnate	P6	P7	P8	P98	P114
10017884	C	G	2	1	0	0	0
		A	0	0	7	0	0
		T,A	0	0	0	4	0
10020433	A	C	0	0	0	4	1

score 0 · Answer 1 · answered Jan 31 '21 at 20:34

0

import pandas as pd
df = pd.read_csv('test.csv')
df.pivot(['Position', 'Reference', 'Alternate'], 'Sample', 'No. of Reads').fillna(0)

answered Jan 31 '21 at 20:34

locriacyber

100
4

score 0 · Accepted Answer · answered Jan 31 '21 at 21:30

Using R tidyr::pivot_wider, assuming data in dataframe df1:

library(tidyr)
df1 %>% 
  pivot_wider(names_from = "Sample", 
              values_from = "No. of Reads", 
              values_fill = 0)

Result:

# A tibble: 4 x 8
  Position Reference Alternate    P6    P7    P8   P98  P114
     <int> <chr>     <chr>     <int> <int> <int> <int> <int>
1 10017884 C         G             2     1     0     0     0
2 10017884 C         A             0     0     7     0     0
3 10017884 C         T,A           0     0     0     4     0
4 10020433 A         C             0     0     0     4     1

Data:

df1 <- read.table(text = "Sample    Position    Reference   Alternate   'No. of Reads'
P6  10017884    C   G   2
P7  10017884    C   G   1
P8  10017884    C   A   7
P98 10017884    C   T,A 4
P114    10020433    A   C   1
P98 10020433    A   C   4", header = TRUE, check.names = FALSE)

Is there any way to summarize data by reformatting?

2 Answers2