How to create multiple subsets of data based on a column name without specifying all the unique names in said column (R)

Question

Sorry if this question is idiotically simple but for some reason I can't figure it out.

I have a rather large dataset consisting of different epidemiological measures for all countries and a given year.

#Country  #Date. # Measure
##UK.      ##2013.   ##X
##UK.      ##2014.   ##Y
##UK.      ##2015.   ##Z
....

I'd like to create subsets of this dataset for each country

I'd like to do it without having to specify the name of each country in my code.

e.g without doing this

head(filter(Dataset,country=="UK"d))

What do you want to do with those subsets? Save them? Perform any process/analysis? Something else? — AntoniosK, Mar 29 '18 at 16:09
`split(Dataset, Dataset$Country)` if you are comfortable working with lists. — hpesoj626, Mar 29 '18 at 16:10
Thank you for the quick reply. I want to use these subsets for further analysis later on so I'd like to in essence create a new dataset for each country — willepi, Mar 29 '18 at 16:10
You can use lists, as @hpesoj626 suggested, or try grouping with dplyr: `group_by(Dataset, country)` and all dplyr operations will perform separately for each group. — csgroen, Mar 29 '18 at 16:12
Can you give a specific example how you are going to use the subsets? A minimal example will do. — hpesoj626, Mar 29 '18 at 16:14
Yes of course, I'm intended to calculate measures of incidence and prevalence for each country per year and present them as separate tables for a poster. — willepi, Mar 29 '18 at 16:16
Try to keep it as is and work with `dplyr::group_by` for next step, split it as a list of `data.frames` only to feed the different `data.frames` to a function that wouldn't handle `group_by` (and then tidy up the result in a data.frame as soon as you get them back). The exception is if there are performance issues, then it might be better to split and work on the list all along. But don't create one variable per df. — moodymudskipper, Mar 29 '18 at 16:18
you can present them as a separate table by using split then lapply and your printing function at the last step — moodymudskipper, Mar 29 '18 at 16:20
Thank you so much everyone - Splitting into a list did exactly what I needed. Recently left the world of SAS so R is quite new (and amazingly exciting)! — willepi, Mar 29 '18 at 16:23
Possible duplicate of [R - split data frame and save to different files](https://stackoverflow.com/questions/33426973/r-split-data-frame-and-save-to-different-files) — hpesoj626, Mar 29 '18 at 16:26

AntoniosK · Accepted Answer · 2018-03-29T16:30:15.957

4

Check this simple example if you want to create a subset based on each country and save that subset as a .csv:

dt = data.frame(country = c("UK","UK","USA"),
                date = c("2013","2014","2013"),
                measure = 1:3, stringsAsFactors = F)

library(tidyverse)

dt %>%
  split(.$country) %>%
  map2(.x = ., .y = paste0(names(.),".csv"), ~write.csv(.x, .y, row.names = F))

The .csv files will be saved in your working directory.

Alternatively, you can use map instead of map2 and create the file names from the country column value (you know it will be unique):

dt %>%
  split(.$country) %>%
  map(.x = ., ~write.csv(.x, paste0(unique(.x$country),".csv"), row.names = F))

edited Mar 29 '18 at 16:30

answered Mar 29 '18 at 16:23

AntoniosK

15,991
2
19
32

This is exactly what I was looking for - Thank you so much :) – willepi Mar 29 '18 at 16:27
better to use `walk` and `walk2` for side effects – moodymudskipper Mar 29 '18 at 16:34
what do you mean @Moody_Mudskipper ? – AntoniosK Mar 29 '18 at 16:36
1

and you can use `imap` or `iwalk` if you want to take the names as the `.y` parameter, so the last line can become : `iwalk(~write.csv(.x, paste0(.y,".csv"), row.names = F))` in your first option. – moodymudskipper Mar 29 '18 at 16:38
2

I mean that here you're using `map` functions, which return lists. While `walk` functions are designed to behave silently while doing the same as `map`, it also signals to the reader that the output is not important but the side effect is. `?purr::walk` – moodymudskipper Mar 29 '18 at 16:40

How to create multiple subsets of data based on a column name without specifying all the unique names in said column (R)

1 Answers1

Linked