I used functions from this answer to read multiple files and create a data table. I wanted to have the FileNames in different columns and for each variable that it doesn't exist to other "FileNames" to fill it with 0
part of dataset:
dput(dt[1:4])
structure(list(FileName = c("Sample_4C_NaIO4", "Sample_4C_NaIO4",
"Sample_4C_NaIO4", "Sample_4C_NaIO4"), smallRNA = c("TCGTACGACTCTTAGCGG",
"GTACGACTCTTAGCGG", "CTCGTACGACTCTTAGCGG", "CGTACGACTCTTAGCGG"
), counts = c(4166178L, 564940L, 89932L, 52670L)), class = c("data.table",
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x180a460>)
my code:
temp <- list.files(pattern = ".txt")
dt <- rbindlist( sapply(temp,fread,simplify=FALSE),
use.names = TRUE, idcol = "FileName")
dt$FileName <- gsub(".txt","",dt$FileName)
finaldt <- dcast.data.table(dt, smallRNA+counts ~FileName,
drop=FALSE,fill=0)
result:
finaldt <- dcast.data.table(dt,smallRNA+counts ~ FileName,drop = FALSE,fill = 0)
Using 'counts' as value column. Use 'value.var' to override
Error in CJ(smallRNA = c("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAA", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAG", :
Cross product of elements provided to CJ() would result in 70585808594 rows which exceeds .Machine$integer.max == 2147483647
I thought of using this package : bit64 but I'm not sure how.
version:
version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray
Edit 1
Last part of the code must be changed to:
finaldt <- dcast.data.table(dt, smallRNA ~FileName,
drop=FALSE,fill=0,value.var=counts)
Edit 2 issue with numbers lower than 1
in the combined dataset "dt" there aren't any values lower than 1 :
filter(dt,counts<1)
[1] FileName smallRNA counts
<0 rows> (or 0-length row.names)
> myfiles[[1]] %>% filter(counts<1) %>% tail()
# A tibble: 6 x 2
smallRNA counts
<chr> <dbl>
1 ENST00000592744.1 ncrna chromosome:GRCh38:9:81946438:81976806:-1 gene:ENSG00000267559… 0.00106
2 ENST00000594089.1 ncrna chromosome:GRCh38:11:64778954:64779405:1 gene:ENSG00000269038… 0.00106
3 ENST00000607991.1 ncrna chromosome:GRCh38:22:38743495:38743910:1 gene:ENSG00000273076… 0.00106
4 ENST00000608972.1 ncrna chromosome:GRCh38:7:29008926:29010252:1 gene:ENSG00000272568.… 0.00106
5 ENST00000618845.1 ncrna chromosome:GRCh38:14:49863072:49864379:1 gene:ENSG00000278002… 0.00106
6 ENST00000625800.1 ncrna chromosome:GRCh38:CHR_HG2232_PATCH:233205199:233205479:1 gene… 0.00106
Is there a way to include these values also?