0

I was attempting to re-arrange some data but ran into some problems and would greatly appreciate any advice or suggestions you may have.

Background: I measured the value of three genes (FTH1, TFR1, VEGF) on a sample called A three times. Some of the measurements of the genes on the third run were not recorded (hence why some genes have two values compared to three for others). The data in long form is as below:

          Sample   Gene    Value
1         A        FTH1    19.287
2         A        FTH1    18.411
3         A        TFR1    21.536
4         A        TFR1    22.528
5         A        TFR1    20.255
6         A        VEGF    14.414
7         A        VEGF    14.009

I would like to reshape this data into the following format for easier down-stream analysis:

Sample  FTH1    TFR1    VEGF
A       19.287  21.2536 14.414
A       18.411  22.528  14.009
A       N/A     20.255  N/A

What would be the best way to go about reformatting the data into the form above?

I tried using dcast as below

library(reshape2)
library(tidyverse)

data = read.csv("data.csv")

dcast(data, Sample ~ Gene, value = "Value")

but was met with the following error:

Aggregation function missing: defaulting to length
Error in .fun(.value[0], ...) : 
  2 arguments passed to 'length' which requires 1

I think this is happening because some Genes (i.e FTH1 and VEGF) have two entries whereas TFR1 has three - I'm not 100% sure however. Any advice on how to accomplish this re-shape would be greatly appreciated!

s__
  • 9,270
  • 3
  • 27
  • 45
Mangoplant
  • 11
  • 2

1 Answers1

1

According to ?reshape2::dcast, the usage iss

dcast(data, formula, fun.aggregate = NULL, ..., margins = NULL, subset = NULL, fill = NULL, drop = TRUE, value.var = guess_value(data))

so, the argument is value.var

dcast(data, Sample ~ Gene, value.var = "Value")

Also, as there are duplicate elements, a sequence column is needed

library(data.table)
dcast(setDT(data), rowid(Gene) + Sample ~ Gene, value.var = "Value")[,
        Gene := NULL][]
#   Sample   FTH1   TFR1   VEGF
#1:      A 19.287 21.536 14.414
#2:      A 18.411 22.528 14.009
#3:      A     NA 20.255     NA

data

data <- structure(list(Sample = c("A", "A", "A", "A", "A", "A", "A"), 
    Gene = c("FTH1", "FTH1", "TFR1", "TFR1", "TFR1", "VEGF", 
    "VEGF"), Value = c(19.287, 18.411, 21.536, 22.528, 20.255, 
    14.414, 14.009)), class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662