1

I am fairly new to R but really have tried to find an answer to my problem, but was unsuccessful.

I have two data frames "Brexit_final" and "Brexit_Google_Trends". Both data frames have a "Date" column BUT! the Brexit_Final frame has less dates than the other one. I want to make a new set of data in which only the rows are kept where both frames have the date.

And in the process I also want to delete a lot of the columns.

Brexit_Final
Date        Remain   Leave  Undecided  Total_Difference
2016-06-18  42       44     13         7.5
2016-06-20  47.25    46     5.25       15
2016-06-23  55       45     0          14

Brexit_Google_Trends
Date        EU       Referendum  Brexit  Difference
2016-06-18  44       100         65      22
2016-06-19  23       100         62      55
2016-06-20  28       40          36      24
2016-06-21  37       55          43      36
2016-06-22  7        10          55      44
2016-06-23  67       100         62      103

Dream_Frame
Date        Total_Difference  Difference
2016-06-18  7.5               22
2016-06-20  15                24
2016-06-23  14                103
halfer
  • 19,824
  • 17
  • 99
  • 186
Felix
  • 23
  • 2

4 Answers4

2

You can use an inner_join from the dplyr package.

inner_join(Brexit_Final, Brexit_Google_Trends, by = "Date") %>% select(Total_Difference,  Difference)
student
  • 1,001
  • 2
  • 12
  • 24
  • Thanks a lot! Turns out the problem wasn't my approach but rather that I had the wrong data type in my data frame. But when I used your suggestion an error message popped up which led me to the right conclusion. Thank you so much. Would have never found the solution without you. – Felix Mar 22 '17 at 20:48
1

From this canonical question, we get:

Dream_Frame <- merge(Brexit_Final, Brexit_Google_Trends, by = "Date")
Dream_Frame <- Dream_Frame[,c("Date", "Total_Difference", "Difference")

Or, to do it in one step,

Dream_Frame <- merge(Brexit_Final[, c("Date", "Total_Difference")], 
                     Brexit_Google_Trends[, c("Date", "Difference")],
                     by = "Date")
Community
  • 1
  • 1
BLT
  • 2,492
  • 1
  • 24
  • 33
1
Brexit_Final = Brexit_Final[,c("Date","Total_Difference")]
Brexit_Google_Trends = Brexit_Google_Trends[,c("Date","Difference")]
Dream = merge(Brexit_Final, Brexit_Google_Trends,by="Date")
Aleksandr
  • 1,814
  • 11
  • 19
0

Used the suggestion from "student"

inner_join(Brexit_Final, Brexit_Google_Trends, by = "Date") %>% select(Date, Total_Difference, Difference)

With the slight addition of adding in the "Date" as a column to keep. If anybody else is struggling with this. A problem in my data frame was that the "Difference" and "Total_Difference" were not in numeric format but rather also a data frame I attached to the others. So I used:

Brexit_final$Total_Difference <- as.numeric(Brexit_final$Total_Difference[[1]])

And the same for "Difference" to make them numeric first. Then all the provided solutions worked.

Thanks for your help @all

Felix
  • 23
  • 2