0

I want to combine the data so that it outputs the same 15 rows (matched by "column 1", only the additional columns not present in df2, appear at the end of what would have otherwise been df1.

Some columns may contain matching values between the two data frames, and in those cases im happy not for those columns to merge across.

Example:

df1                         
Column 1 Column 2 Column 3.      
1a       Blue.    15
2b       Red.     20
3c       Green.   25

df2
Column 1 Column 3 Column 4.
1a       15       Apple 
2b       20       Pear
3c       25.      Orange

df3 (new Dataframe)
Column 1 Column 2 Column 3. Column 4. 
1a       Blue.    15        Apple
2b       Red.     20        Pear  
3c       Green.   25        Orange 

The following does not work because where originally both data sets have 10 rows, once combined the new dataset has 30 rows.

merge.df3 <- merge(df1,df2, by  = "column 1")
Mark
  • 7,785
  • 2
  • 14
  • 34
JAK15
  • 1
  • 1

1 Answers1

0

"column1" won't match with "Column 1". You need to use the right capitalisation:

df1 <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
        `Column 1` = c("1a", "2b", "3c"),
        `Column 2` = c("Blue.", "Red.", "Green."),
        `Column 3` = c(15L, 20L, 25L))
df2 <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
        `Column 1` = c("1a", "2b", "3c"),
        `Column 3` = c(15, 20, 25),
        `Column 4` = c("Apple", "Pear", "Orange")
)

merge(df1, df2, by = c("Column 1", "Column 3")) # gives correct answer

Here is the documentation, and here a basic tutorial on how merge works.

Mark
  • 7,785
  • 2
  • 14
  • 34
  • "merge(df1, df2, by = "Column 1") # gives correct answer" - For some reason this is not working exactly as it should. It's creating a double up of rows in my new Dataframe. I now have two 1a rows and 2b rows. Any reason why? – JAK15 Jul 16 '23 at 00:33
  • updated it. Please run the entire thing (including creating the dataframes) and see what you get – Mark Jul 16 '23 at 11:32
  • Hi Mark, it did not work and I think it might be because there are multiple (two) identical values in each Dataframe under Column 1. So for example, 1a is listed twice, 2b is listed twice, and in both datasets. When they are merged, the new Dataframe includes four 1a's, instead of just two 1a's. I assume I need to combine based on two conditions as opposed to just the one, so that it merges with the correct row. Does that make sense? – JAK15 Jul 21 '23 at 04:06
  • @JAK15 in your example, 1a and 1b aren't listed twice. You can't expect me to answer a question if you don't give the correct input – Mark Jul 21 '23 at 04:22
  • sorry... I only realised that might be important information when it wasn't working correctly. I am stuck. Do you know how to resolve the issue? The new Dataframe just seems to create additional observations instead of adding the second Dataframe columns to the original Dataframe rows. PS. how did you create a beautiful table of data when you cleaned up my original question? – JAK15 Jul 21 '23 at 04:54
  • @JAK15 put the correct information into the question. – Mark Jul 21 '23 at 05:03
  • I figured it out... df3 <- df1 %>% left_join(df2, by=c('column 1'='column 1', 'column 3'='column 3')) Thanks for the help though – JAK15 Jul 22 '23 at 00:47