1

I have around 8-10 dataframes. Each with two columns i.e., Date and Value

Data Frame 1 - df1

   library(readr)
    df1 <- read_table("Date Value_df1
           01-01-2020 1200
           02-01-2020 1300
           03-01-2020 1240
           06-01-2020 3900
           31-12-2020 2000")

Second dataframe - df2 (Starts from a different date and ends in the last month of 2020)

df2 <- read_table("Date Value_df2
        03-01-2020 120
        04-01-2020 130
        06-01-2020 140
        06-01-2020 150
        08-01-2020 1657
        30-12-2020 6000")

and df3

df3 <- read_table("Date Value_df3
            03-01-2020 120
            04-02-2020 130
            06-02-2020 140
            06-02-2020 150
            08-03-2020 1657
            30-12-2020 6000")

I used this code to fill the missing dates for dataframe 1 - to make df1 a continuous data and set all the nas to 0 for the value column

df1 %>%
  tidyr::complete(date = seq.Date(df1$Date[1],df1$Date[nrow(df1)], by="day"))

I want to add a the second column of all other data frames to this first data frame accordingly with their dates. The final data frame should look like this.

  Date       Value_df1    value_df2   value_df3   value_df4 .....value_df8
    01.01.2020   1200       0
    02.01.2020   1300       0
    03.01.2020   1240      120
    04.01.2020    0        130
    05.01.2020    0        140
    06.01.2020   3900      150
    07.01.2020    0          0
    08.01.2020    0        1657
    09.01.2020    0        0
    10.02.2020   1500      0
    .
    .
    .
    30.12.2020     0      6000
    31.12.2020   2000      0

I hope my question is clear. Can anyone help me out with this. how can I add the column with respect to the dates of the first data frame.

Bella_18
  • 624
  • 1
  • 14

2 Answers2

3

You could use the sqldf() package and write it as two SQL statements, one to get all unique dates and one to get the needed values:

library(sqldf)

df1 <- some_data
df2 <- your_other_data

# Get distinct dates
# don't use union all so only distinct values are returned

date_query <- ' select date from df1
           union 
           select date from df2'

unique_date <- sqldf(date_query)

# Left join df1 and df2 to dates

final_query <- 
'select date
,d1.value_df1
,d2.value_df2 
from unique_date as u
# left join in case no matching value
left join df1 as d1 on u.date = d1.date
left join df2 as d2 on u.date = d2.date'

final_data <- sqldf(final_query)
 

After this code runs, it's up to you if you want to set NAs to zero, etc.

  • @mighty_frosty_dragonsbane With the inputs in the question. I have tried executing your code. but instead of merging two data frames are concatenated df2 below df1. Can you please check from your side? – Bella_18 Sep 08 '22 at 13:54
  • @bella_pa you are right. What I would do is make a separate dataframe containing the unique dates of each and then join them using sqldf(). I'll edit my answer. – mighty_frosty_dragonsbane Sep 15 '22 at 19:05
2

First you want to do put your tables into a list to do full joins on all with Reduce. Then you want to replace the NA's with 0's (here using coalesce with mutate-across).

library(dplyr)
#library(lubridate)

list(df1, df2, df3) |>
  Reduce(full_join, x = _) |>
  mutate(across(starts_with("Value"), coalesce, 0)) |>
  arrange(lubridate::dmy(Date))

If Date is already a date-type, you can use arrange(Date) and avoid lubridate.

Output:

# A tibble: 13 × 4
   Date       Value_df1 Value_df2 Value_df3
   <chr>          <dbl>     <dbl>     <dbl>
 1 01-01-2020      1200         0         0
 2 02-01-2020      1300         0         0
 3 03-01-2020      1240       120       120
 4 04-01-2020         0       130         0
 5 06-01-2020      3900       140         0
 6 06-01-2020      3900       150         0
 7 08-01-2020         0      1657         0
 8 04-02-2020         0         0       130
 9 06-02-2020         0         0       140
10 06-02-2020         0         0       150
11 08-03-2020         0         0      1657
12 30-12-2020         0      6000      6000
13 31-12-2020      2000         0         0

Data:

library(readr)

df1 <- read_table("Date Value_df1
           01-01-2020 1200
           02-01-2020 1300
           03-01-2020 1240
           06-01-2020 3900
           31-12-2020 2000")

df2 <- read_table("Date Value_df2
            03-01-2020 120
            04-01-2020 130
            06-01-2020 140
            06-01-2020 150
            08-01-2020 1657
            30-12-2020 6000")

df3 <- read_table("Date Value_df3
            03-01-2020 120
            04-02-2020 130
            06-02-2020 140
            06-02-2020 150
            08-03-2020 1657
            30-12-2020 6000")
harre
  • 7,081
  • 2
  • 16
  • 28
  • Use `arrange`, see update :) – harre Sep 06 '22 at 14:51
  • Thanks again. I am getting an error when I run the code with your input data. The error is:Error in `across()`: ! Must be used inside dplyr verbs. Run `rlang::last_error()` to see where the error occurred. – Bella_18 Sep 06 '22 at 15:14
  • Do you have any other packages loaded? Try to restart R. – harre Sep 06 '22 at 15:18
  • I tried again @harre. can you tell me, Why is reduce(full_join, x=_) what does "_" mean? – Bella_18 Sep 07 '22 at 09:21
  • It's similar a placeholder for the pipe, similar to: `Reduce(full_join, x = list(df1, df2, df3))`. – harre Sep 07 '22 at 10:12
  • the dataframes cannot be listed . This is the error I get. Error in match.fun(f) : 'list(df1, df2, df3)' is not a function, character or symbol – Bella_18 Sep 08 '22 at 13:57
  • Does the code run for you on the data I provided? I will suggest that you try to restart R. – harre Sep 08 '22 at 14:14
  • + your `tidyr::complete`-code is not needed with my approach. – harre Sep 08 '22 at 14:26