0

Hi everyone I am fairly new here. I m having trouble figuring out how to combine the same dates in the same column. Some Ids contain both the same dates and different ones. My goal was to only combine the ones with the same dates. Also, let me say that this df is a result of some basic cleaning and grouping of the originals.

    library(dplyr)

str(notimesw)

grouped_df [480 × 5] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ Id                : num [1:480] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ date              : Date[1:480], format: "2016-04-12" "2016-04-13" ...
 $ WeightPounds      : num [1:480] NA NA NA NA NA NA NA NA NA NA ...
 $ TotalSleepRecords : num [1:480] 1 2 1 2 1 1 1 1 1 1 ...
 $ TotalMinutesAsleep: num [1:480] 327 384 412 340 700 304 360 325 361 430 ...
 - attr(*, "groups")= tibble [26 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ Id   : num [1:26] 1.50e+09 1.64e+09 1.84e+09 1.93e+09 2.03e+09 ...
  ..$ .rows: list<int> [1:26] 
  .. ..$ : int [1:27] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ : int [1:4] 26 27 28 29
  .. ..$ : int [1:3] 30 31 32
  .. ..$ : int [1:6] 33 34 35 36 37 416
  .. ..$ : int [1:28] 38 39 40 41 42 43 44 45 46 47 ...
  .. ..$ : int 66
  .. ..$ : int [1:15] 67 68 69 70 71 72 73 74 75 76 ...
  .. ..$ : int [1:2] 417 418
  .. ..$ : int [1:28] 82 83 84 85 86 87 88 89 90 91 ...
  .. ..$ : int [1:8] 110 111 112 113 114 115 116 117
  .. ..$ : int [1:28] 118 119 120 121 122 123 124 125 126 127 ...
  .. ..$ : int [1:24] 144 145 146 147 148 149 150 151 152 153 ...
  .. ..$ : int [1:28] 168 169 170 171 172 173 174 175 176 177 ...
  .. ..$ : int [1:10] 196 197 198 199 200 421 422 423 424 425
  .. ..$ : int [1:28] 201 202 203 204 205 206 207 208 209 210 ...
  .. ..$ : int [1:31] 229 230 231 232 233 234 235 236 237 238 ...
  .. ..$ : int [1:27] 260 261 262 263 264 265 266 267 268 269 ...
  .. ..$ : int [1:18] 286 287 288 289 290 291 292 293 294 295 ...
  .. ..$ : int [1:3] 304 305 306
  .. ..$ : int [1:61] 307 308 309 310 311 312 313 314 315 316 ...
  .. ..$ : int [1:2] 338 339
  .. ..$ : int [1:24] 340 341 342 343 344 345 346 347 348 349 ...
  .. ..$ : int [1:3] 364 365 366
  .. ..$ : int [1:32] 367 368 369 370 371 372 373 374 375 376 ...
  .. ..$ : int [1:15] 399 400 401 402 403 404 405 406 407 408 ...
  .. ..$ : int [1:24] 457 458 459 460 461 462 463 464 465 466 ...

If the above is a little too much here is a carbon copy of the 1st 12 lines.


    notime <-data.frame (
  Id=c(1503960366,1503960366,1503960366,4319703577,4319703577,4319703577,5553957443,5553957443,5553957443,6117666160,6117666160,6117666160),
  date=c("2016-04-27", "2016-04-12","2016-04-27", "2016-04-12", "2016-04-27","2016-04-27","2016-5-16","2016-4-16","2016-4-16","2016-04-24","2016-04-24","2016-04-28"),
  WeightPounds= c(115,NA, NA,NA,NA,135,NA,137,NA,NA,185,184),
  TotalSleepRecords=c(NA,1,1,1,2,NA,2,1,1,1,2,NA),
  TotalMinutesAsleep=c(NA,405,555,214,355, NA,495 ,222,542,228,350,NA))

I tried the following code chunk


notimesw%>%  group_by(date,Id) %>% 
  summarize(WeightPounds=mean(WeightPounds,na.rm=TRUE),
            TotalSleepRecords=mean(TotalSleepRecords,na.rm=TRUE),
            TotalMinutesAsleep=mean(TotalMinutesAsleep,na.rm=TRUE))

And received the following


 WeightPounds TotalSleepRecords TotalMinutesAsleep
1     158.8118          1.118644           419.4673

I have tried another suggestion


     aggregate(df_merge[-1], list(df_merge$Id), FUN = mean, na.rm = TRUE)

Which combine the Ids with the same dates but the other dates are missing.

     Group.1       date WeightPounds TotalSleepRecords TotalMinutesAsleep
1  1503960366 2016-04-27     115.9631          1.080000           360.2800
2  1644430081 2016-05-02          NaN          1.000000           294.0000
3  1844505072 2016-04-25          NaN          1.000000           652.0000
4  1927972279 2016-04-17     294.3171          1.600000           417.0000
5  2026352035 2016-04-27          NaN          1.000000           506.1786
6  2320127002 2016-04-23          NaN          1.000000            61.0000
7  2347167796 2016-04-21          NaN          1.000000           446.8000
8  2873212765 2016-05-01     125.6635               NaN                NaN
9  3977333714 2016-04-25          NaN          1.142857           293.6429
10 4020332650 2016-04-30          NaN          1.000000           349.3750
11 4319703577 2016-04-27     159.5045          1.038462           476.6538
12 4388161847 2016-04-27          NaN          1.291667           403.1250
13 4445114986 2016-04-27          NaN          1.392857           385.1786
14 4558609924 2016-04-29     153.5299          1.000000           127.6000
15 4702921684 2016-04-26          NaN          1.071429           421.1429
16 5553957443 2016-04-27          NaN          1.225806           463.4839
17 5577150313 2016-04-24     199.9593          1.038462           432.0000
18 6117666160 2016-04-26          NaN          1.222222           478.7778
19 6775888955 2016-04-14          NaN          1.000000           349.6667
20 6962181067 2016-04-27     135.7019          1.096774           448.0000
21 7007744171 2016-04-23          NaN          1.000000            68.5000
22 7086361926 2016-04-27          NaN          1.000000           453.1250
23 8053475328 2016-04-26          NaN          1.000000           297.0000
24 8378563200 2016-04-26          NaN          1.125000           443.3438
25 8792009665 2016-04-24          NaN          1.000000           435.6667
26 8877689391 2016-04-26     187.7144               NaN                NaN
> 

Hopefully, I explained it correctly. Thanks In advance for your help. It is greatly appreciated.

  • I do not understand. The group_by - summarise solution looks fine to me. Otherwise, please provide your expected output – mnist Nov 24 '21 at 21:16
  • Your `dplyr` code looks fine, but it seems like your `group_by` is ignored. Generally this means you loaded the old `plyr` package **after** loading `dplyr` and ignored the warnings about how this can cause problems. You can force the `dplyr` versions to run by specifying `dplyr::summarize`. See [this FAQ](https://stackoverflow.com/q/26106146/903061) for more info. – Gregor Thomas Nov 24 '21 at 21:18
  • You can also simplify the code with `notimesw%>% group_by(date,Id) %>% summarize(across(everything(), mean, na.rm = TRUE))`, assuming you want all columns summarized. – Gregor Thomas Nov 24 '21 at 21:19
  • @GregorThomas You are awesome. It was as simple as switching to dplyr::summarize .Thank you so much and thanks to everyone who replied. Enjoy the rest of your day. – arobbie jammer Nov 24 '21 at 21:37
  • @GregorThomas Quick question is there a site or learning tool besides the cheatsheets that can provide a breakdown of each function within the package and the sub-functions that can be used with it . Something that will allow me to dive deeper into the structure and placement of the code or function. I am trying to learn how to structure my code chunks. I am not coming across anything besides the basics. Thanks in advance. – arobbie jammer Nov 24 '21 at 22:21
  • The R for Data Science book is free online and is probably the best way to go. There are also [9 package vignettes](https://cran.r-project.org/web/packages/dplyr/) which are very good. – Gregor Thomas Nov 24 '21 at 22:34
  • @GregorThomas Thank you again. This is a great help. – arobbie jammer Nov 24 '21 at 22:49

0 Answers0