0

How to merge files with different row numbers in each file? The columns are always the same (same numbers, same names), but files have different number of rows.

I must merge them with UTF-8 encoding. Couldn't pull it off in map_df() function. Rbind is not working, must be same amount of rows

Any suggestions?

rainbowthug
  • 67
  • 1
  • 8
  • This sounds like a "merge" operation, just like you said, but that's not done with `rbind`. Look up `merge` (base R) and the `dplyr::*_join` functions. It's a different way of data-munging, I suggest https://stackoverflow.com/q/1299871/3358272 and https://stackoverflow.com/a/6188334/3358272 as a couple of good references for figuring out if you need "left", "right", "full", etc. If you want anything more than this, you'll need to [edit] your question and add sample data and expected output. – r2evans Jan 25 '21 at 16:21
  • Hey `full_data <- dir("path", full.names = T) %>% map_df(read.xlsx, sheetIndex = 2)` is enough but it doesnt support UTF-8 encoding, couldn't pull of encoding = "UTF-8" in map_df(), when I try to expand read.xlsx which supports encoding parameter - `full_data <- dir("path", full.names = T) %>% map_df(read.xlsx(encoding = "UTF-8",sheetIndex = 2))` R claim I didn't provide the file, which is kind of true, but I don't know how to provide file here, as I am on my way to create file – rainbowthug Jan 26 '21 at 12:36
  • 1
    `... %>% map_df(~ read.xlsx(.x, encoding="UTF-8", sheedIndex=2))` – r2evans Jan 26 '21 at 13:14
  • Thanks, it works as intended. Can you describe what tilde before function read.xlsx does? Also, the dot before file symbolize that it is "just an argument" to let user choose next parameters without worrying about this one, which is not necessary? Or how to understand this? – rainbowthug Jan 26 '21 at 13:43
  • Start with https://purrr.tidyverse.org/articles/other-langs.html, see that with `rlang` (used by `dplyr`, `tidyr`, `purrr`, etc), the `~` is a unary function-like operator. The `.x` is replaced with the argument. When using `map2_*` (two-arg variants), there is both `.x` (first) and `.y` (second). – r2evans Jan 26 '21 at 14:23
  • `~ read.xlsx(.x, ...)` is equivalent to (and can actually be replaced by) `function(.x) read.xlsx(.x, ...)`. – r2evans Jan 26 '21 at 14:32

0 Answers0