0

I am newbie in R. I got a problem when combining data, hope that someone help to resolve it. Suppose that I have two CSV files such as A.csv and B.csv are located at the path "C:\Users\Public\A". They look like that:

A.csv

T,2015,2016,2017,2018 X1,1,2,3,2 X2,1,2,2,3 X3,1,3,4,2

B.csv

T,2015,2016,2017 X1,2,4,3 X2,2,2,3 X3,3,3,4

And then I try to combine them as well as transpose them with following functions. They are created by Ricardo Oliveros-Ramos at here and by Tony Cookson at here. 1. Firstly, I create function read.tcsv to read and transpose data in CSV file

  read.tcsv = function(file, header=TRUE, sep=",", ...) {
  n = max(count.fields(file, sep=sep), na.rm=TRUE)
  x = readLines(file)

  .splitvar = function(x, sep, n) {
    var = unlist(strsplit(x, split=sep))
    length(var) = n
    return(var)
  }

  x = do.call(cbind, lapply(x, .splitvar, sep=sep, n=n))
  x = apply(x, 1, paste, collapse=sep) 
  out = read.csv(text=x, sep=sep, header=header, ...)
  return(out)

}

2. Then I use multrbind.fill to combine and fill missing value

multrbind.fill = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){
    read.tcsv(file=x,header=T)
  }
)
    Reduce(function(x,y) {plyr::rbind.fill(x,y)}, datalist)
}
  1. The result looks good:

ï..T X1 X2 X3 2015 1 1 1 2016 2 2 3 2017 3 2 4 2018 2 3 2 2015 2 2 3 2016 4 2 3 2017 3 3 4

  1. However, I want to add a column as an identifier for each file with their file name (or unique IDs) like that:

ï..T ID X1 X2 X3 2015 A 1 1 1 2016 A 2 2 3 2017 A 3 2 4 2018 A 2 3 2 2015 B 2 2 3 2016 B 4 2 3 2017 B 3 3 4

Someone help me!? Thanks in advance.

2 Answers2

1

Thanks TJ83 for helping me to correct the function. Base on TJ83's comments, I add an identifier column named ID. And here is my complete function read.tcsv

read.tcsv = function(file, header=TRUE, sep=",", ...) {
  n = max(count.fields(file, sep=sep), na.rm=TRUE)
  x = readLines(file)

  .splitvar = function(x, sep, n) {
    var = unlist(strsplit(x, split=sep))
    length(var) = n
    return(var)
  }

  x = do.call(cbind, lapply(x, .splitvar, sep=sep, n=n))
  x = apply(x, 1, paste, collapse=sep)

out = read.csv(text=x, sep=sep, header=header, ...)
out$ID<-tools::file_path_sans_ext(basename(file))
return(out)

} 
0

1) If you only have 2 datasets, then the fastest way would be to add an ID-column on A and B datasets after transformation but before binding the rows.

Dataset_A$ID<-"A"
Dataset_B$ID<-"B"
# Where Dataset_X is the name of your imported transformed datasets.

2) Could you show your exact code used? If you only have 2 datasets I think the code you have created could be simplified substantially. If you are interested in a simplification then please supply us with output from the 2 dput-statements below:

A<-read.csv("A-dataset")
B<-read.csv("B-dataset")
dput(A)
dput(B)
TJ83
  • 71
  • 4
  • In fact, I have more than 350 CSV files in folder A, so I have to use function `multrbind.fill` to combine all of them at once. – Vĩnh Vũ Quang Nov 23 '18 at 21:15
  • I call name of these files by `tools` with `file_path_sans_ext(list.files(file.path(mypath)))`. But, I do not know how to add it into the data frame properly. – Vĩnh Vũ Quang Nov 24 '18 at 00:34
  • Here is the output of my result: `dput(multirbind("C:/Users/Vu Quang Vinh/Desktop/A")) structure(list(ï..T = c(2015L, 2016L, 2017L, 2018L, 2015L, 2016L, 2017L), X1 = c(1L, 2L, 3L, 2L, 2L, 4L, 3L), X2 = c(1L, 2L, 2L, 3L, 2L, 2L, 3L), X3 = c(1L, 3L, 4L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, -7L))` – Vĩnh Vũ Quang Nov 24 '18 at 01:23
  • When you do it all in one step and I don't have access to the csv-files, it is very difficult to reproduce. `read.tcsv(file=x,header=T) }` Between the 2 lines you should add something like `x$ID<-paste(x)` – TJ83 Nov 24 '18 at 01:36
  • You can create 2 CSV files like A.csv and B.csv that I mentioned above. And then using `read.tcsv` as well as `mulrbind.fill` to reproduce the result. I also tried your suggestion but it doesn't work :(. – Vĩnh Vũ Quang Nov 24 '18 at 02:26
  • Last try... Between the two lines: 'out = read.csv(text=x, sep=sep, header=header, ...)' 'return(out)'. Add 'out$ID<-paste(x)' – TJ83 Nov 24 '18 at 03:45
  • The error due to difference rows. `Error in $<-.data.frame(*tmp*, "ID", value = c("T,X1,X2,X3", "2015,1,1,1", : replacement has 5 rows, data has 4` Btw, thank you very much! – Vĩnh Vũ Quang Nov 24 '18 at 04:03