0

So I am working with some Census API data. Sadly there are a number of NAs in some of the columns. I am replacing the NAs with the column means, but I figured I would get more accurate information if I split the data by county first. Here is where my problem lies; I am unable to merge them back to a single dataframe correctly. I know that unsplit doesn't work for lists of dataframes so instead as per other posts I am using

do.call("rbind", hdi_tract$county)

instead. It seems to have worked but now I am getting more NA values than what I started with before splitting the data. Why is this the case?

  • 4
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 18 '19 at 16:37
  • 1
    You can use `bind_rows` from package `dplyr` instead of `do.call(rbind, ...)`. Alternatively, you could also process your list with `map_dfr` from the `purrr` package to apply an operation to all elements of the list and bind the results into a dataframe. Better yet, don't split your data in the first place if you can use `group_by(county)` instead! For an intro to `dplyr` see https://dplyr.tidyverse.org/ – asachet Nov 18 '19 at 16:40

0 Answers0