R: Mapping the indicator column to what constitutes the column

Question

I am coding in R and I have a dataframe for region such as:

data <- data.frame(Region = c("Cali", "NYC", "LA", "Vegas"), 
                   Group = c(1,2,2,1), stringsAsFactors = F)

The regions have been clubbed to make a group. The group column tells which regions are a part of the group. How can I code, that when I have the group information, I can go and find the regions that constitute that group. Any help is really appreciated.

Please add sample data in a reproducible and copy&paste-able format, e.g. using `dput`. Screenshots are never a good idea, as we can't copy&paste data/code. — Maurits Evers, Mar 20 '19 at 22:31
The *output* of `dput` goes into the body of the question, so we can see your sample of data, not just that phrase in the title. I'm taking it out of the title, but [see here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that's easy for folks to answer — camille, Mar 21 '19 at 03:27

Hector Haffenden · Answer 1 · 2019-03-20T22:34:58.720

1

So with a small, reproducible example,

data <- data.frame(Region = c("Cali", "NYC", "LA", "Vegas"), Group = c(1,2,2,1),stringsAsFactors=F)

we see the following results, say we want all from group 1

group.number = 1
data[data$Group == group.number,"Region"]
[1] Cali  Vegas

Or using dpyr

library(dplyr)
group.number = 1
data %>%
  filter(Group == group.number)%>%
.$Region

Or from Jilber Urbina (Much more readable)

subset(data, Group==1)$Region

edited Mar 20 '19 at 22:34

answered Mar 20 '19 at 22:23

Hector Haffenden

1,360
10
25

I have a bigger dataset, would you have any suggestions for that? – Bruce Wayne Mar 20 '19 at 22:26
So you require a faster solution? Or with different columns etc? Try adding dput(head(your.data)) to the question. – Hector Haffenden Mar 20 '19 at 22:27
1

you can also use `subset(data, Group==1)` – Jilber Urbina Mar 20 '19 at 22:30
I don't need to subset the data, I need to create a vector that constitutes this information. Extract information thats all. Hector's code works, just need to make it more dynamic – Bruce Wayne Mar 20 '19 at 22:34
@BruceWayne Could you define what you mean by dynamic? – Hector Haffenden Mar 20 '19 at 22:37
Meaning, be able to do this for multiple such columns – Bruce Wayne Mar 21 '19 at 00:11

Maurits Evers · Accepted Answer · 2019-03-20T22:40:22.343

Most importantly and for future posts please

include sample data in a reproducible and copy&paste-able format using e.g. dput
refrain from adding superfluous statements like "This one is super urgent!"

As to your question, first I'll generate some sample data

set.seed(2018)
df <- data.frame(
    Region = sample(letters, 10),
    Group = sample(1:3, 10, replace = T))

I recommend summarising/aggregating data by Group, which will make it easy to extract information for specific Groups.

For example in base R you can aggregate the data based on Group and concatenate all Regions per Group

aggregate(Region ~ Group, data = df, FUN = toString)
#  Group        Region
#1     1             m
#2     2    i, l, g, c
#3     3 b, e, k, r, j

Or alternative you can store all Regions per Group in a list

aggregate(Region ~ Group, data = df, FUN = list)
#  Group        Region
#1     1             m
#2     2    i, l, g, c
#3     3 b, e, k, r, j

Note that while the output looks identical, toString creates a character string, while list stores the Regions in a list. The latter might be a better format for downstream processing.

Similar outputs can be achieved using dplyr

library(dplyr)
df %>%
    group_by(Group) %>%
    summarise(Region = toString(Region))

Is there a way to remove the duplicates and just get the unique values using the dplyr code? Because the data would have a lot of repetition. Also, any way to store the output as list with dplyr? — Bruce Wayne, Mar 20 '19 at 23:21
@BruceWayne You can use `unique` to avoid duplicates: e.g. `... summarise(Region = toString(unique(Region)))`; the base R solution `aggregate(Region ~ Group, data = df, FUN = list)` already contains a `list` column `Region`. More importantly, you need to edit your post to include data in a reproducible and copy&paste-able format! You've been asked for this multiple times now. — Maurits Evers, Mar 20 '19 at 23:23

R: Mapping the indicator column to what constitutes the column

2 Answers2