-2

I have tens of csv files I want just to read in r the 2nd and 3rd columns of each file and then merge them by the 1st column named gene together while naming each column by its filename this is the columns I want to read

rep.B0.t1month

         gene             count
 SPAP11E10.01             12608
  SPAC20G8.02              1218
 SPBC23G7.12c             10306
  SPBC17G9.09              8635
 SPAC1639.02c              5909
 SPCC1739.08c              5700
  SPCC569.08c              5283

rep.A0.t1month

           gene            counts
       SPAC343.09             19722.      
     SPAP11E10.01             15958
      SPAC20G8.02             13849
     SPBC32F12.09             11276
     SPBC23G7.12c              9054
       SPBC3F6.05              7703
     SPBPB8B6.04c              6553

I want to end up having something like but for tens of files

            gene   rep.A0.t1month   rep.B0.t1month
      SPAC20G8.02        12100         4508
               wt         9825         2625
     SPAP11E10.01         8960         2904
     SPBC32F12.09         8302         3956
     SPBC23G7.12c         7636         1708
      SPCC1919.01         6568         1950
      SPCC4G3.05c         6486         3682
r2evans
  • 141,215
  • 6
  • 77
  • 149
ucb
  • 1
  • 2
  • Welcome to Stack Overflow. Please don’t use images of data as they cannot be used without a lot of unnecessary effort. [Questions should be reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Peter Aug 28 '21 at 11:22
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Aug 28 '21 at 21:00

1 Answers1

2

Edit for the different data structure.

Previously, you said "1st and 3rd column", now it is a two-column table; using the subsetting I originally suggested of [,c(1,3)] makes no sense if your data only has two columns. Currently your question reads "2nd and 3rd columns" but still your data only has two columns, so I'm going to honor your premise of columns and names and ignore the rest.

When I create two files as you provided above, I can run:

Reduce(function(a, b) merge(a, b, by = 1L, all = TRUE),
       lapply(list.files(pattern = "^rep", full.names = TRUE),
              function(fn) {
                x <- read.table(fn, header = TRUE)
                names(x)[2] <- basename(fn)
                x
              }))
#            gene rep.A0.t1month rep.B0.t1month
# 1  SPAC1639.02c             NA           5909
# 2   SPAC20G8.02          13849           1218
# 3    SPAC343.09          19722             NA
# 4  SPAP11E10.01          15958          12608
# 5   SPBC17G9.09             NA           8635
# 6  SPBC23G7.12c           9054          10306
# 7  SPBC32F12.09          11276             NA
# 8    SPBC3F6.05           7703             NA
# 9  SPBPB8B6.04c           6553             NA
# 10 SPCC1739.08c             NA           5700
# 11  SPCC569.08c             NA           5283

I don't know about your expected output, since your sample data is inconsistent and has unmatched gene values. I hope this is helpful.

Here I did a full-join, but you can change it to a left-join by changing all=TRUE to all.x=TRUE.


Original code, for reference.

files <- list.files("path", pattern = "csv$", full.names = TRUE)
out <- Reduce(function(a,b) merge(a, b, by = 1L), lapply(files, function(fn) {
  x <- read.csv(fn)[,c(1,3)]
  names(x)[2] <- basename(fn)
  x
}))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'Reduce': 'names' attribute [3] must be the same length as the vector [2] Called from: h(simpleError(msg, call)) – ucb Aug 29 '21 at 10:52
  • I need them to be left_join – ucb Aug 29 '21 at 10:53
  • Your data structure has changed, I updated my question. – r2evans Aug 29 '21 at 19:00
  • HEllo sorry about changing itt – ucb Aug 30 '21 at 00:09
  • The table I show that has 2 column is the 2 column I want to include that column 2 and 3 and this table is sooo long about 3000 gene i just show the head of it – ucb Aug 30 '21 at 00:10