2

I have a list of files with the following pattern: sampleid_samplename_counts.csv representing count matrix with cellname in rows and genes in columns.

I'm trying to generate count matrix from these files to load in the Seurat package that need cellnames in columns and gene in rows.

I succeed in obtaining the matrix with the following where x is a vectors of all the .csv filenames (I needed to create a new name for the object as I couldn't reassign "i" within the loop):

for (i in x) {
  assign(paste0(i, ".counts"), t(read.table(i, sep = ",")))
  }

But the obtained matrix does not have colnames (it's replaced with V1, V2, V3...) Row 1 of each matrix contain the cellnames.

The following is creating the correct colnames:

colnames(file.counts) <- file.counts[1,]
file.counts <- file.counts[-1,]

But this does not work in a for() loop. How can I implement this in the initial or in another loop?

EDIT:

Here is what the original .csv file looks like after a simple read.table:

structure(list(V1 = c(NA, 121270371765165, 121270372580596, 121270373898541, 
121270374395228, 121270374676403, 121270375926581, 121270376000796, 
121270376290589, 121270378289958, 121270378621156, 121347513957787, 
121347516024694, 121347517659934, 121347518125797, 121347519671644, 
121347519760734, 121347519921075, 121347520489203, 121896195804531
), V2 = c("DPM1", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", ""), V3 = c("SCYL3", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""
), V4 = c("FGR", "", "", "", "", "", "", "", "", "1.0", "", "", 
"", "", "", "", "", "", "", ""), V5 = c("CFH", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", ""), 
    V6 = c("FUCA2", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", ""), V7 = c("GCLC", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    ""), V8 = c("NFYA", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", ""), V9 = c("NIPAL3", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", ""), V10 = c("LAS1L", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "")), row.names = c(NA, 
20L), class = "data.frame")

And this is what looks like after the transpose t()

structure(c(NA, "DPM1", "SCYL3", "FGR", "CFH", "FUCA2", "GCLC", 
"NFYA", "NIPAL3", "LAS1L", "ENPP4", "ANKIB1", "KRIT1", "RAD52", 
"BAD", "LAP3", "CD99", "MAD1L1", "LASP1", "SNX11", "1.212704e+14", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "1.212704e+14", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "1.0", "", "", "", "1.212704e+14", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "1.212704e+14", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "1.212704e+14", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "1.0", 
"", "1.212704e+14", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "1.212704e+14", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "1.212704e+14", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "1.212704e+14", "", "", "1.0", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", ""), .Dim = c(20L, 10L
), .Dimnames = list(c("V1", "V2", "V3", "V4", "V5", "V6", "V7", 
"V8", "V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16", 
"V17", "V18", "V19", "V20"), NULL))
Sebrw
  • 47
  • 7
  • 4
    some sample data/files would be helpful... – Wimpel Jun 02 '20 at 13:19
  • how can I transfer it? I'm rather new to the forum – Sebrw Jun 02 '20 at 13:24
  • original files I use comes from here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114725 (col=genes, rows=cells count matrix) – Sebrw Jun 02 '20 at 13:26
  • 2
    Please take a look at [How to make a great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and [How to ask](https://stackoverflow.com/help/how-to-ask). – Martin Gal Jun 02 '20 at 13:33
  • 1
    Sebrw, it's not reasonable to ask someone to download an 83Mb archive to help answer your question. Please see the link Martin provided. – Ian Campbell Jun 02 '20 at 13:39
  • sorry for the inconvenience... I edit the post hoping it's better. The output should looks just like the transposed matrix but with the 1st row as colnames – Sebrw Jun 02 '20 at 13:47

1 Answers1

2

Perhaps this is helping...

library( data.table )

#build list of csv.gz files (I only kept two files in the dir)
files.to.read <- list.files( "./temp/GSE114725_RAW/", pattern = "\\.csv.gz$", full.names = TRUE )

enter image description here

#build a list of read data
L <- lapply( files.to.read, data.table::fread )

#transpose each list element
L.transposed <- lapply( L, data.table::transpose, keep.names = "gene", make.names = "V1" )
#set list names based on filenames
names(L.transposed) <- basename( files.to.read )

#write list elements to individual data.tables
for(i in 1:length(L.transposed)) assign(names(L.transposed)[i], L.transposed[[i]])

enter image description here

enter image description here

Wimpel
  • 26,031
  • 1
  • 20
  • 37