1

Suppose I only have access to a cbinded data.frame r below. Because the variable names in the original data.frames before cbinding are the same, is it possible to separate r into the original data.frames?

Note. This is just a toy example, a functional solution is appreciated.

# Original data.frames:

c1 <- data.frame(study.name = c(1,1,2,3), mod.s=c(3,3,1,2), mod.g=c(1,1,3,1))
c2 <- data.frame(study.name = c(1,1,2,3), mod.s=c(3,3,2,1), mod.g=c(1,2,3,2))

r <- cbind(c1, c2[-1]) # The only available cbined data.frame
rnorouzian
  • 7,397
  • 5
  • 27
  • 72
  • It is better not to create data.frame with same names – akrun Jul 13 '20 at 18:54
  • It is a very delicate and buggy issue because every time you may do some transformation, the duplicate columns may get a suffix because of the property of data.frame using `make.unique` – akrun Jul 13 '20 at 18:58
  • It is not entirely clear about the expected. But, It is better to have unique names i.e. `lst1 <- list(names(c1), names(c2)); lst2 <- relist(make.unique(do.call(c, lst1)), lst1); names(c1) <- lst2[[1]]; names(c2) <- lst2[[2]]` – akrun Jul 13 '20 at 19:07
  • Yes, I am here, I was just looking for a bug free method to split up because. Once the `r` is made there is not much information in that regarding from which data it came from unless it is based on column patterns. I would create a `list` of data.frame and use that for further splitting – akrun Jul 13 '20 at 19:21
  • @akrun, Do you possibly know [this](https://stackoverflow.com/questions/63061620/creating-a-sequence-of-dates-with-a-special-format)? – rnorouzian Jul 23 '20 at 19:23

1 Answers1

1

If we are keeping it in a list and then cbind, there is a way of identification

lst1 <- list(c1, c2[-1])
r <- do.call(cbind, lst1)
split.default(r,  rep(seq_along(lst1), sapply(lst1, ncol)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @rnorouzian yes, that we can do with `split.default(r, names(r))` into a `list` of data.frames i.e. here you would get a data.frame with `mod.g`, `mod.s` and `study.name` as separate datasets – akrun Jul 13 '20 at 19:36
  • @rnorouzian may be this works for you `names(which(sapply(split.default(r[-1], names(r)[-1]), function(x) all(!colSums(!aggregate(.~ study.name, transform(x, study.name = r$study.name), FUN = is.constant)[-1])))))# [1] "mod.s"` – akrun Jul 13 '20 at 19:43
  • @rnorouzian or use `by` `names(which(sapply(split.default(r[-1], names(r)[-1]), function(x) all(do.call(cbind, by(x, r$study.name, FUN = function(y) apply(y, 2, is.constant))))))) [1] "mod.s"` – akrun Jul 13 '20 at 19:46
  • @rnorouzian `colSums` may be faster, but we are also using `aggregate` that should slow down – akrun Jul 13 '20 at 20:04
  • @rnorouzian if you check my code, I used `spit.default(r[-1], names(r)[-1])` to remove the grouping column – akrun Jul 13 '20 at 20:36
  • @rnorouzian you are not removing the 'study.name' column in the `split.default`. It can removed by `i1 <- colnames(r) != 'study.name'; split.default(r[i1], names(r)[i1])` – akrun Jul 13 '20 at 20:50
  • @rnorouzian have you checked my updated comment earlier `i1 <- colnames(r) != 'study.name'; split.default(r[i1], names(r)[i1])` – akrun Jul 13 '20 at 20:54
  • @rnorouzian yes, I was under the thought process that you would anyway removing the duplicate 'study.name' column – akrun Jul 13 '20 at 20:57
  • @rnorouzian Here the `is.constant` returns logical vector. So, I assume it would work unless there is any issue in returning the logical vector i.e any NA or not. If there are NAs, then you can add `na.rm = TRUE` in `colSums` – akrun Jul 14 '20 at 18:51
  • Hi Arun, do you know a solution to [this](https://stackoverflow.com/questions/63182133/reading-a-por-dataset-into-r-from-google-drive)? – rnorouzian Jul 30 '20 at 22:18