0

I am trying to create a data frame which contains a) a time span of 51 days (time of the Corona Lockdown) and b) calculated frequencies of Tweets in this time span. The Problem is that not every day there had been tweeted, so there are dates missing in the frequency table. But in order to continue and calculate some correlations I would need a data frame which has a values/missing value for every single day of the time span. How can I achieve this? Is there any other way to calculate the frequencies? Or any way to bind the data together?

LockdownDays <-  seq.Date(from = as.Date('2020-03-19'), to = as.Date('2020-05-08'), by = 'days') ##this is the date vector that contains all the dates I need values for

frequencyD <- table (ThemaD$date) ##This is the calculated frequencies from the Tweets dataset

As I said, the problem is that:

  • They are of different length
  • The frequency value has to match the right date in the LockdownDays vector.

So in the End I want a dataframe with the Dates on the x axis, the frequencies on y. If there is no frequency for the day I still want there to be a date in the dataframe, best would be with 0 or NA for the y value.

df <- c("2020-01-02", "2020-01-03", "2020-01-03", "2020-01-05")
freq <- table (df)
dates <-  seq.Date(from = as.Date('2020-01-01'), to = as.Date('2020-01-08'), by = 'days') 
print (dates)
cbind (freq, df)
  • 2
    Welcome to [so]! Can you please give a [mre] in your question? – jogo Sep 08 '20 at 13:14
  • If you need some help, [please check this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. For example, to produce a minimal data set, you can use `head()`, `subset()`. Then use `dput()` to give us something that can be put in R immediately. Alternatively, you can use base R datasets such as `mtcars`, `iris`, *etc*. – Paul Sep 08 '20 at 13:16
  • Also, have you checked functions from `dplyr`? Such as `left_join()`, `right_join()`, `inner_join()`, `full_join()`? See ?`mutate-joins` for further info. – Paul Sep 08 '20 at 13:22
  • Tried them, but was creating an error. Also was not quite sure which function to choose – Rennacker54 Sep 08 '20 at 13:33
  • Try `dput(head(ThemaD$date,30))`. The picture you just showed is about a `Var1` and a `Freq` that are not even in the question. Also: Pictures are a bad way to communicate rows of numeric data... – Bernhard Sep 08 '20 at 13:34
  • ```> dput(head(ThemaD$date,30)) structure(c(18341, 18343, 18343, 18344, 18344, 18344, 18345, 18345, 18345, 18345, 18348, 18350, 18351, 18352, 18357, 18357, 18358, 18358, 18358, 18360, 18360, 18360, 18360, 18362, 18362, 18362, 18362, 18362, 18362, 18362), class = "Date") ``` I don't see how this solves the problem, sorry, I am a complete newbie – Rennacker54 Sep 08 '20 at 14:04

1 Answers1

0

You can put your dates and frequency table into dataframes and then use dplyr::left_join to achieve what you want:

library(dplyr)

# OP's data
LockdownDays <-  seq.Date(from = as.Date('2020-03-19'), to = as.Date('2020-05-08'), by = 'days')
ThemaD <- tibble(
    date = structure(c(18341, 18343, 18343, 18344, 18344, 18344, 18345,  18345, 18345, 18345, 18348, 18350, 18351, 18352, 18357, 18357,  18358, 18358, 18358, 18360, 18360, 18360, 18360, 18362, 18362,  18362, 18362, 18362, 18362, 18362), class = "Date") 
)
frequencyD <- table(ThemaD$date)

left_join solution here:

df <- tibble(
    date = LockdownDays
  ) %>%
  left_join(
    as.data.frame(frequencyD) %>%
      mutate(Var1 = as.Date(Var1)),
    by = c(date = 'Var1')
  )
alex_jwb90
  • 1,663
  • 1
  • 11
  • 20