-1

I am coding in R and I have a dataframe for region such as:

data <- data.frame(Region = c("Cali", "NYC", "LA", "Vegas"), 
                   Group = c(1,2,2,1), stringsAsFactors = F)

enter image description here

The regions have been clubbed to make a group. The group column tells which regions are a part of the group. How can I code, that when I have the group information, I can go and find the regions that constitute that group. Any help is really appreciated.

camille
  • 16,432
  • 18
  • 38
  • 60
Bruce Wayne
  • 471
  • 5
  • 18
  • Please add sample data in a reproducible and copy&paste-able format, e.g. using `dput`. Screenshots are never a good idea, as we can't copy&paste data/code. – Maurits Evers Mar 20 '19 at 22:31
  • The *output* of `dput` goes into the body of the question, so we can see your sample of data, not just that phrase in the title. I'm taking it out of the title, but [see here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that's easy for folks to answer – camille Mar 21 '19 at 03:27

2 Answers2

1

So with a small, reproducible example,

data <- data.frame(Region = c("Cali", "NYC", "LA", "Vegas"), Group = c(1,2,2,1),stringsAsFactors=F)

we see the following results, say we want all from group 1

group.number = 1
data[data$Group == group.number,"Region"]
[1] Cali  Vegas

Or using dpyr

library(dplyr)
group.number = 1
data %>%
  filter(Group == group.number)%>%
.$Region

Or from Jilber Urbina (Much more readable)

subset(data, Group==1)$Region
Hector Haffenden
  • 1,360
  • 10
  • 25
1

Most importantly and for future posts please

  1. include sample data in a reproducible and copy&paste-able format using e.g. dput
  2. refrain from adding superfluous statements like "This one is super urgent!"

As to your question, first I'll generate some sample data

set.seed(2018)
df <- data.frame(
    Region = sample(letters, 10),
    Group = sample(1:3, 10, replace = T))

I recommend summarising/aggregating data by Group, which will make it easy to extract information for specific Groups.

For example in base R you can aggregate the data based on Group and concatenate all Regions per Group

aggregate(Region ~ Group, data = df, FUN = toString)
#  Group        Region
#1     1             m
#2     2    i, l, g, c
#3     3 b, e, k, r, j

Or alternative you can store all Regions per Group in a list

aggregate(Region ~ Group, data = df, FUN = list)
#  Group        Region
#1     1             m
#2     2    i, l, g, c
#3     3 b, e, k, r, j

Note that while the output looks identical, toString creates a character string, while list stores the Regions in a list. The latter might be a better format for downstream processing.


Similar outputs can be achieved using dplyr

library(dplyr)
df %>%
    group_by(Group) %>%
    summarise(Region = toString(Region))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Is there a way to remove the duplicates and just get the unique values using the dplyr code? Because the data would have a lot of repetition. Also, any way to store the output as list with dplyr? – Bruce Wayne Mar 20 '19 at 23:21
  • 2
    @BruceWayne You can use `unique` to avoid duplicates: e.g. `... summarise(Region = toString(unique(Region)))`; the base R solution `aggregate(Region ~ Group, data = df, FUN = list)` already contains a `list` column `Region`. More importantly, you need to edit your post to include data in a reproducible and copy&paste-able format! You've been asked for this multiple times now. – Maurits Evers Mar 20 '19 at 23:23
  • I did add some data. Would add more going forward – Bruce Wayne Mar 20 '19 at 23:59