-4

Please help me to write a code in combining multiple csv files in R using a common field in all the files (but each file has diff no. of rows like 702, 666, etc.) and for missing values I need NA in the place of missing values.

For your easy understanding, here I am showing how an individual file looks since here I cannot attach multiple files. I am just pasting one. name is the common filed in all the files. With the filename as its header along with its column name.

    name   projected_leaf_area  treatment  species      g_alias     replicates
1   A-2:1   215.209             WW       Chickpea  ICCRIL03-0013    2
2   A-2:2   148.404             WW       Chickpea  ICCRIL03-0119    2
3   A-2:3   206.566             WW        Chickpea ICCRIL03-0007    2
.  .. .  . .  ... ........  ......
 .  .. .  . .   ... ........  ...... 
702 B-2:234 242.06              WW        Chickpea ICCRIL03-0143    4

please help me to merge the files friends... thanks for your time...

eswari
  • 1
  • 2
  • Please, format your question properly. Upper-case text has a specific meaning on the internet, it means you're shouting ! So please don't use it just to highlight some sentences, use bold or italic text instead. – digEmAll Feb 23 '15 at 10:29
  • sorry from next time onwards I ll not use it.. I thought just to highlight it but from onwards I will use bold or italics... thank u and sorry... – eswari Feb 23 '15 at 10:47
  • Please make your problem reproducible. Chatting in comments isn't the way to go. – Roman Luštrik Feb 23 '15 at 19:36
  • @RomanLuštrik Sir but how to make it public discussion??? for their answer it is the way how I m supposed to reply right??? or if there any other way I m happy to use that... – eswari Feb 24 '15 at 07:42
  • In my view, comments are for clarifying things that should not be in the main question. I think answers fail to provide you with out of the box solutions is because your question is not clear enough. Please, do not feel offended, consider my comment and act, or not, on it. – Roman Luštrik Feb 24 '15 at 08:13
  • @RomanLuštrik thanks for your suggestions I ll act accordingly.. what is not clear about my question.. if you say I would be much pleased to change it.. already I have made few changes after listening to you.. thank you.. – eswari Feb 24 '15 at 08:15
  • Provide enough data to demonstrate your problem and show us what the expected result should look like. If there are any borderline cases, point that out, too. To get you started, there are a few tips on how to go about [point one](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Roman Luštrik Feb 24 '15 at 08:30

2 Answers2

0

Load all of the files using read.csv and merge them together using Reduce:

files <- dir(pattern = ".csv")
d <- lapply(files, read.csv, stringsAsFactors = FALSE)
for(i in seq_along(d)) {
    names(d[[i]]) <- paste0(files[i], ".", names(d[[i]]))
}
Reduce(function(x,y) merge(x,y,by="name", all=TRUE, sort = FALSE), d)
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • thank you so much for your time Thomas... but it is combining all the csv files down by down (vertically).. I want to be combined horizontally with the common field. please????? – eswari Feb 23 '15 at 10:58
  • You'll have to elaborate. You probably want to use `merge` (or maybe `cbind`) instead of `rbind`, but it depends on exactly how you want to combine rows. – Thomas Feb 23 '15 at 11:35
  • Thomas for different size rows cbind cannot be used right??? if I try to use merge , files <- dir(pattern = ".csv") d <- lapply(files, read.csv, stringsAsFactors = FALSE) Q<-do.call("merge (files) by="name",all=TRUE, sort = F)", d)...... it is showing error as "Error: unexpected symbol in "Q<-do.call("merge (files) by="name" ".... see the above in the original question I have pasted you how exactly all my looks, so based on the name field I want all my csv files combined with its fila names as header... for example my file names would be s5, s6, s7, s8... – eswari Feb 23 '15 at 12:05
  • so in the combined output I would need, as s5.projected_leaf_area, s5.treatment ,s5.species ,s5.g_alias, s5.replicates, s6.projected_leaf_area s6.treatment s6.species s6.g_alias s6.replicates and so on.. totally I have 50 files.. the column field name would be common in all the files.. hope now I explained u clearly.. looking forward your reply please... – eswari Feb 23 '15 at 12:07
  • when ever I run the last line(reduce) the following error is coming, Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column Called from: stop(ngettext(sum(bad), "'by' must specify a uniquely valid column", "'by' must specify uniquely valid columns"), domain = NA) Browse[1]> the above error I m getting again while this new cose sorry... it is asking me to debug by "fix.by"..... how to solve this please...??? – eswari Feb 24 '15 at 07:40
0
file.data <- lapply(file.names, function(fn){read.csv(fn)})    
common.cols <- Reduce('intersect', lapply(file.data, colnames))
do.call('rbind', lapply(file.data, function(fd) fd[, common.cols]))
#

Ignore the above Try

mergeall <- function(x, y){merge(x, y, all = TRUE)}
file.data <- lapply(file.names, function(fn){read.csv(fn)})
# example data:
# file.data[[1]] <- data.frame(id = letters[1:4], val1 = 1:4)
# file.data[[2]] <- data.frame(id = letters[1:3], val2 = c(1, NA, 2))
# file.data[[3]] <- data.frame(id = letters[2:5], val3 = factor(letters[1:4]))
Reduce('mergeall', file.data)
#   id val1 val2 val3
# 1  a    1    1 <NA>
# 2  b    2   NA    a
# 3  c    3    2    b
# 4  d    4   NA    c
# 5  e   NA   NA    d
Russ Hyde
  • 2,154
  • 12
  • 21
  • thank you so much for your time Russ... but it is showing an error of file.names not found??? please could you tell why??? – eswari Feb 23 '15 at 11:00
  • you would need to define file.names <- c(...) as a string of file names you wish to import. – nathaneastwood Feb 23 '15 at 11:05
  • Natty could you please elaborate... sorry I couldn't get your point.. I m new to this r programming environment.. my file names would be like s6, s7, s8, s9, .....s50 – eswari Feb 23 '15 at 11:15
  • It is similar to how @Thomas has defined them `files <- dir(pattern = ".csv")` – nathaneastwood Feb 23 '15 at 11:33
  • Russ, I m getting error in the source viewer as (lapply debugging), function (X, FUN, ...) { FUN <- match.fun(FUN) if (!is.vector(X) || is.object(X)) X <- as.list(X) .Internal(lapply(X, FUN)) } if I try run this, I m getting error as---- Error during wrapup: object of type 'closure' is not subsettable... please could you tell me how to solve this???? – eswari Feb 23 '15 at 11:55
  • @russ hyde -- but hyde can you please tell me how to use merge command in your code??? I cannot use cbind since I m not having same size data frames... – eswari Feb 24 '15 at 07:41
  • oh, disregard this post, it doesn't do what you need. Next time give us a reproducible testcase – Russ Hyde Feb 24 '15 at 08:55
  • I've edited the code to do what you apparently wanted. Thomas has already given you this solution though – Russ Hyde Feb 24 '15 at 09:05
  • @Russ Hyde thanks for your great help.. actually your code made me to succeed in combining all my csv files 98% perfectly :). thanks lot... but only problem is its not maintaining the order of the common field based on which the files has to be combined. in other words, my common field name in the file and the data in this column looks like A-2:1 A-2:2 A-2:3 A-2:4 A-2:5 .......... ......... A-2:232 A-2:233 A-2:234 B-1:1 B-1:2 B-1:3 B-1:4 B-1:5 B-1:6 B-1:7 B-1:8 ..... ........ B-1:229 B-1:230 B-1:231 B-1:232 B-1:233 B-1:234....sorryi ll cont in the next comment – eswari Feb 24 '15 at 09:58
  • so what I tried is sort = False.. its works to a certain extent.. but wherever I am getting missing values its going down the row separately... but I don't want like this... it has to maintain the same order like the common field... pls help me with this...also can you pls tell me how to get the file names in the header since I m having date in the names of 100 files , I need them to be on the header to identify which file is which???? – eswari Feb 24 '15 at 10:00
  • @RussHyde--- is there is anyway to attach my csv file to this forum... so that it will be very clear to you what I am requesting you for???? – eswari Feb 24 '15 at 10:20