1

Probably an extension to : De-aggregate / reverse-summarise / expand a dataset in R I have a dataframe that looks like :

I_Code  Date_1  Date_2
2   14/09/2019  16/08/2019
2   14/09/2019  17/08/2019
2   14/09/2019  19/08/2019
2   14/09/2019  20/08/2019
2   14/09/2019  21/08/2019
2   14/09/2019  21/08/2019
2   14/09/2019  21/08/2019
2   14/09/2019  22/08/2019
2   14/09/2019  23/08/2019
2   14/09/2019  23/08/2019
2   14/09/2019  24/08/2019
2   14/09/2019  27/08/2019
2   14/09/2019  28/08/2019
2   14/09/2019  28/08/2019
2   14/09/2019  29/08/2019
2   14/09/2019  04/09/2019
2   14/09/2019  04/09/2019
2   14/09/2019  04/09/2019
2   14/09/2019  05/09/2019
2   14/09/2019  08/09/2019
2   14/09/2019  10/09/2019
2   14/09/2019  10/09/2019
2   14/09/2019  12/09/2019

I_code can take 5 values.

I have another dataframe that looks like :

date_2  count
20/09/2019  415
19/09/2019  431
31/08/2019  386
24/09/2019  404
11/08/2019  252
27/09/2019  441
28/09/2019  398
17/09/2019  430
07/09/2019  388
10/09/2019  369
22/08/2019  318
25/09/2019  420
25/08/2019  380
17/08/2019  291
01/09/2019  381
30/08/2019  345
22/09/2019  455
07/09/2019  388
09/08/2019  213
24/09/2019  404
23/08/2019  344
17/08/2019  291
07/09/2019  388

here the count represents the frequency of the date_2.

I want to expand the dataframe1 to 20k entries based on frequency of date_2 from dataframe2.

for each date_2 in dataframe 1, we have 5 I_code associated. I want to distribute them equally into the dataframe.

for example :

20/09/2019 has a total of 415 count, then we need to have 415 entries of (I_code, Date_1, Date_2). There might be multiple tuple entries for 3 fields in the dataframe1. we need to pick all the values associated with date_2 and then expand all the entries based on the frequency of date_2 in dataframe2.

Can anyone help.

dper
  • 884
  • 1
  • 8
  • 31
  • You don't have all of the dates from df1 in df2. What is the expected behavior in such case? Also, if you have a date multiple times in df1, you want to replicate each according the count in df2 (meaning, let's say 3*200)? – tmfmnk Oct 23 '19 at 06:17
  • @tmfmnk I have all of the dates from df1 in df2. df1 might have multiple dates based on I_Code and date_1. for example df1 has lets say 10 entries of date_2 and it has a count of 400 in df2. then we would have (400/10) = 40 entries for each of the 10 entries in df1. – dper Oct 23 '19 at 06:20
  • @tmfmnk I have just pasted a sample in the question, the dataset would not have extra dates. – dper Oct 23 '19 at 06:23

0 Answers0