0

I would like to merge at least two tables using a DateTimeStamp column of the format "%Y-%m-%d %H:%M:%OS" (e.g. "28-Jun-2019 20:50:55.123"). Although at different positions, the DateTimeStamp column is present in all tables, but the interval information and all other columns differ between tables. Every table is in a different tab-separated .txt-file. I would like to combine the information of all tables into one table.

I have:

    A.txt:
    DateTimeStamp              AEvents
    28-Jun-2019 20:50:55.123   X
    28-Jun-2019 20:53:52.237   Y
    ...
    B.txt:
    BEvents DateTimeStamp
    Z       28-Jun-2019 20:52:23.328
    W       28-Jun-2019 20:51:41.962
    ...

I would like to get:

    AB.txt:
    DateTimeStamp              AEvents BEvents
    28-Jun-2019 20:50:55.123   X
    28-Jun-2019 20:51:41.962           W
    28-Jun-2019 20:52:23.328           Z
    28-Jun-2019 20:53:52.237   Y
    ...
camille
  • 16,432
  • 18
  • 38
  • 60
VGF
  • 55
  • 8
  • Please try `combined<- merge(A.txt, B.text,by=c("DateTimeStamp"))` – Mohamed Rahouma Nov 18 '19 at 16:37
  • `dplyr::bind_rows(A.txt, B.txt)` – Matt Nov 18 '19 at 16:43
  • "Merge" often implies a join, like that there are timestamps present in both data frames, and you want to combine different sets of information along each timestamp. This looks instead like just binding two data frames that have different columns; `dplyr::bind_rows` (and likely some other functions) does that, as would adding the missing empty columns to each data frame, then using the base `rbind` – camille Nov 18 '19 at 17:27
  • Does this answer your question? [Combine two data frames by rows (rbind) when they have different sets of columns](https://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns) – camille Nov 18 '19 at 17:28
  • Thank you. This solved my problem. – VGF Jul 03 '20 at 12:27

1 Answers1

2

You can use bind_rows

library(tidyverse) 

A.txt %>% 
  bind_rows(., B.txt) %>% 
  mutate_at(vars(AEvents, BEvents), ~replace_na(., "")) %>% 
  arrange(DateTimeStamp)

#             DateTimeStamp AEvents BEvents
#1 28-Jun-2019 20:50:55.123       X        
#2 28-Jun-2019 20:51:41.962               W
#3 28-Jun-2019 20:52:23.328               Z
#4 28-Jun-2019 20:53:52.237       Y     

Matt
  • 2,947
  • 1
  • 9
  • 21
  • What if I have a lot of "vars" (not only "AEvents" and "BEvents") - could I include all those columns from all files but the "DateTimeStamp" column? – VGF Nov 21 '19 at 12:24
  • @W.H.G. you can pass more than two `df`s into `bind_rows`, if that is what you mean. You could also create a `list` of all of your `Events` and then pass it through `bind_rows` – Matt Nov 21 '19 at 12:48
  • E.g. I passed 5 dataframes into bind_rows, but one of them has 16 columns which I would like to include but not write all those column names into vars(). – VGF Nov 21 '19 at 13:01
  • @W.H.G Would be easier with an example, and this may be a new question. However, you can use `select_helpers` within `vars` for example `vars(-one_of("DateTimeStamp")` which would select all `vars` that do not contain `DateTimeStamp` (you can include more than one variable). It really depends on what your data looks like. – Matt Nov 21 '19 at 13:08