0

Please forgive me if the answer to this is obvious, I am very new to R.

I am trying to aggregate this set of data but one of the columns keeps returning NA.

> dput(head(DrivingDistance,50))
structure(list(player_name = c("Brian Stuard", "Billy Hurley III", 
"Greg Chalmers", "William McGirt", "Russell Knox", "Cody Gribble", 
"Tony Finau", "Dustin Johnson", "Justin Thomas", "Vaughn Taylor", 
"Jason Day", "Brendan Steele", "Si Woo Kim", "Brandt Snedeker", 
"Jason Dufner", "Ryan Moore", "Rod Pampling", "Fabián Gómez", 
"Jimmy Walker", "Jim Herman", "Pat Perez", "Daniel Berger", "Patrick Reed", 
"James Hahn", "Mackenzie Hughes", "Branden Grace", "Jordan Spieth", 
"Hideki Matsuyama", "Charley Hoffman", "Jhonattan Vegas", "Aaron Baddeley", 
"Bubba Watson", "J.T. Poston", "Shawn Stefani", "Stewart Cink", 
"William McGirt", "Fabián Gómez", "David Lingmerth", "Henrik Norlander", 
"Tim Wilkinson", "Gonzalo Fernandez-Castaño", "Daniel Summerhays", 
"Webb Simpson", "Peter Malnati", "Jason Bohn", "Vaughn Taylor", 
"Daniel Berger", "Zac Blair", "Ryan Brehm", "Chez Reavie"), date = structure(c(17174, 
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 
17174, 17174, 17174, 17174, 17181, 17181, 17181, 17181, 17181, 
17181, 17181, 17181, 17181, 17181, 17181, 17181, 17181, 17181, 
17181, 17181, 17181, 17181), class = "Date"), DrDis = c("263.1", 
"265.4", "266.5", "267.9", "269.3", "270.8", "304.8", "319.6", 
"301.6", "269.6", "300.4", "288.5", "271.6", "271.9", "272.0", 
"272.6", "275.1", "275.4", "275.6", "276.6", "278.4", "278.5", 
"279.3", "279.8", "280.4", "283.3", "283.4", "283.6", "286.0", 
"286.3", "287.9", "300.3", "304.3", "304.1", "304.0", "303.9", 
"303.5", "303.3", "304.5", "303.0", "301.6", "301.6", "299.6", 
"298.9", "297.6", "296.3", "302.6", "295.1", "305.3", "305.5"
)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
))

Here is the return after trying to aggregate.

   player_name    date       DrDis
   <chr>          <date>     <dbl>
 1 A.J. McInerney 2018-02-21    NA
 2 Aaron Baddeley 2018-08-01    NA
 3 Aaron Rai      2019-06-06    NA
 4 Aaron Wise     2018-10-28    NA
 5 Abraham Ancer  2019-02-13    NA
 6 Adam Bland     2018-03-04    NA
 7 Adam Hadwin    2018-08-11    NA
 8 Adam Long      2019-09-22    NA
 9 Adam Schenk    2019-03-03    NA
10 Adam Scott     2018-08-12    NA
# ... with 551 more rows
There were 50 or more warnings (use warnings() to see the first 50)

Here is the code I am using to create Driving Distance and then aggregate this set of data.

DrivingDistance <-CurrentData[CurrentData$statistic == 'Driving Distance' & CurrentData$variable == 'AVG.',] %>% 
  select(player_name, date, value) %>% 
  dplyr::rename(DrDis = value) 


DrivingDistance %>%
  group_by(player_name) %>%
  summarize_all(mean, na.rm = TRUE)
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Do you have NA values in your data?Sounds like you do. Most likely a duplicate of : https://stackoverflow.com/questions/14261619/subsetting-r-data-frame-results-in-mysterious-na-rows Since you see to be using `dplyr`, use `CurrentData %>% filter(statistic == 'Driving Distance' & variable == 'AVG.')` rather than `[,]` – MrFlick Aug 01 '20 at 04:47
  • You could use `dput(head(CurrentData))` to help generate a workable subset of your data... – beroe Aug 01 '20 at 06:58
  • Is there a column named `value` in your data frame? Your example output doesn't match your example command (has other fields), so it would be useful to see `CurrentData` and not what seems to be `DrivingDistance`. I also avoid using `date` as a variable name since it has other meanings. – beroe Aug 01 '20 at 07:03
  • 1
    @beroe Sorry, I should have been more specific. I just updated the original post with the head of CurrentData. value is in CurrentData but I rename it to DrDis in DrivingDistance – Karsen Mitsche Aug 01 '20 at 12:40
  • @MrFlick I just looked back at every value in the dataset and there are no NA's. I also edited the original post so you can see the head of CurrentData. – Karsen Mitsche Aug 01 '20 at 12:42
  • @KarsenMitsche More data is needed in order to help you try `dput(head(CurrentData,50))` and copy and paste the output into question. – Duck Aug 01 '20 at 12:53
  • @Duck Just added that into the main post. – Karsen Mitsche Aug 01 '20 at 12:56
  • @KarsenMitsche Your data looks more complete now, I have detected a possible issue check this `$ DrDis : chr "43' 10\""` please on your real data use `str(DrivingDistance$DrDis)` and if the output is character, that is what is producing `NA` – Duck Aug 01 '20 at 13:01
  • @KarsenMitsche In order to help you that variable should be numeric. Could you please tell use the measure of that variable? Looks like minutes and seconds. You could replace the `dput()` you placed with `dput(head(DrivingDistance,50))` and I could aide with your issue! – Duck Aug 01 '20 at 13:04
  • @Duck Great, thanks! Just changed the main post with the return of dput(head(DrivingDistance,50)) – Karsen Mitsche Aug 01 '20 at 13:09
  • @KarsenMitsche I have added a possible solution to your issue as answer please check. It looks like your main problem was the character type so you first must transform. I hope that helps :) – Duck Aug 01 '20 at 13:15

1 Answers1

1

Try this solution:

DrivingDistance %>% mutate(DrDis=as.numeric(DrDis)) %>%
  group_by(player_name) %>%
  summarize_all(mean, na.rm = TRUE)

# A tibble: 46 x 3
   player_name      date       DrDis
   <chr>            <date>     <dbl>
 1 Aaron Baddeley   2017-01-08  288.
 2 Billy Hurley III 2017-01-08  265.
 3 Branden Grace    2017-01-08  283.
 4 Brandt Snedeker  2017-01-08  272.
 5 Brendan Steele   2017-01-08  288.
 6 Brian Stuard     2017-01-08  263.
 7 Bubba Watson     2017-01-08  300.
 8 Charley Hoffman  2017-01-08  286 
 9 Chez Reavie      2017-01-15  306.
10 Cody Gribble     2017-01-08  271.
# ... with 36 more rows
Duck
  • 39,058
  • 13
  • 42
  • 84