0

I have data-frame with three columns:

Sample Gene-name FPKM
A1 BRCA1 2.0
B1 LATS1 3.4
C1 WWTR 4.6
D1 FAT1 5.2

My desired format is:

BRCA LATS1 WWTR FAT1
A1 2.0 0 0 0.
B1 0 3.4 0 0
C1 0 0 4.6 0
D1 0 0 0 5.2

I used the following code: Reshaping<-X_df %>% dcast(sample~Gene-name,value.var = "FPKM",fun.aggregate = NULL).

But its throwing an error: Aggregation function missing: defaulting to length

And I am getting an output in a wide-format but the values are just 1. Where are the FPKM values going? what am I doing wrong?

NIA
  • 11
  • 1
  • 1
    Seems you're using tidyverse, why not [`pivot_wider()`](https://tidyr.tidyverse.org/reference/pivot_wider.html)? In case, the error says you'd specify the aggregation function, it seems defaulting to `length`, try to put `sum` looking at your sample data (in the correct format of your `dcast`). – s__ Aug 19 '21 at 07:45
  • Hi. Adding "sum" to aggregate function worked. But I don't understand the logic behind. – NIA Aug 19 '21 at 09:17
  • With that dcast you need to pass a function that aggregates the rows become columns. In your case, sum is ok because you have nothing to sum. You have to change because it defaults to length that function - a more or less a count- ( similar to excel as a reasoning when you do pivots). – s__ Aug 19 '21 at 09:21

1 Answers1

0

As suggested in the comment by @s_ the dplyr::pivot_wider will do the job:

Sample <- c("A1","B1","C1","D1")
Gene_name<-c("BRCA1","LATS1","WWTR","FAT1")
FPKM<-c(2.0,3.4,4.6,5.2)
df<-data.table(Sample,Gene_name,FPKM)
> df
   Sample Gene_name FPKM
1:     A1     BRCA1  2.0
2:     B1     LATS1  3.4
3:     C1      WWTR  4.6
4:     D1      FAT1  5.2

Then you can use:

df2 <- df %>% pivot_wider(names_from = Gene_name, values_from = FPKM, values_fill = 0)
> df2
# A tibble: 4 x 5
  Sample BRCA1 LATS1  WWTR  FAT1
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 A1         2   0     0     0  
2 B1         0   3.4   0     0  
3 C1         0   0     4.6   0  
4 D1         0   0     0     5.2

If you don't specify values_fill, by default dplyr fills missing value with NA.

Mata
  • 538
  • 3
  • 17