1

I am having problem with multiple conditions in R. My data is like this:

Region in UK     Year        Third column (year.city) 
Liverpool        2008           
Manchester       2010
Liverpool        2016
Chester          2015
Birmingham       2016
Blackpool        2012
Birmingham       2005
Chester          2009
Liverpool        2005
Hull             2011
Leeds            2013
Liverpool        2014
Bradford         2008
London           2010
Coventry         2009
Cardiff          2016 
Liverpool        2007

What I want to create is a third column in a way that it has for groups in it: Liverpool before 2010, Liverpool after 2010, Other cities before 2010, other cities after 2010. I tried couple of codes like mutate but could not solve it. May you please help me to do it? Thanks

Mikhail Burshteyn
  • 4,762
  • 14
  • 27
Eshmel
  • 49
  • 2
  • 9
  • Possible duplicate of [can dplyr package be used for conditional mutating?](https://stackoverflow.com/questions/24459752/can-dplyr-package-be-used-for-conditional-mutating) – divibisan Aug 31 '18 at 16:06
  • 1
    Your best bet is to combine `mutate` with `case_when` (both from `dplyr`) as described in the linked question – divibisan Aug 31 '18 at 16:06
  • can you show the expected output – akrun Aug 31 '18 at 16:09
  • Thanks divibisan. I would try it. – Eshmel Aug 31 '18 at 16:09
  • akrun, the expected output is some thing like this: – Eshmel Aug 31 '18 at 16:11
  • Liverpool 2005=1 Liverpool 2011=2 Birmingham 2005=3 Birmingham 2016=4. The Intervention year is 2010. For years before 2010 in Liverpool the value will be 1, over 2010 it will be 2, for other cities it will be 3 and 4, respectively. – Eshmel Aug 31 '18 at 16:15

4 Answers4

1

I would do this as @dvibisan suggested and use dplyr.

# Create a dataframe
df <- structure(list(`Region in UK` = c("Liverpool", "Manchester", "Liverpool", 
                                        "Chester", "Birmingham", "Blackpool", "Birmingham", "Chester", 
                                        "Liverpool", "Hull", "Leeds", "Liverpool", "Bradford", "London", 
                                        "Coventry", "Cardiff", "Liverpool"), 
                     Year = c(2008L, 2010L, 2016L, 2015L, 2016L, 2012L, 2005L, 2009L, 2005L, 2011L, 2013L, 2014L, 2008L, 2010L, 2009L, 2016L, 2007L)), 
                row.names = c(NA, -17L), class = c("data.table", "data.frame"))

# Load the dplyr library to use mutate and if_else (if there were more than 2 conditions of interest for each variable could use case_when)
library(dplyr) 

# Create a new column using mutate, pasting together two conditions
df <-
  df %>% 
  mutate(`Third column (year.city)` = paste0(if_else(grepl("Liverpool", `Region in UK`, fixed = TRUE), `Region in UK`, "Other cities"),
                                             if_else(Year < 2010, " before 2010", " 2010 or after")))
Kerry Jackson
  • 1,821
  • 12
  • 20
  • The only problem is that now I dont see the new column in the data table. It brings it in console, a little chunk of it, but not in the data table. – Eshmel Aug 31 '18 at 20:59
  • I edited the answer to assign the revised dataframe to df, instead of printing it to the console – Kerry Jackson Sep 02 '18 at 15:49
1

The easiest way I think is using vectorisation with base R:

# create index of categories
vec <- c("Other cities after 2010", "Liverpool after 2010", "Other cities before 2010", "Liverpool before 2010")
# create index vector
ix <- 1 + (df$Region.in.UK == "Liverpool") + 2*(df$Year < 2010)

# index the categories-vector with the index-vector
df$year.city <- vec[ix]

The result:

> df
   Region.in.UK Year                year.city
1     Liverpool 2008    Liverpool before 2010
2    Manchester 2010  Other cities after 2010
3     Liverpool 2016     Liverpool after 2010
4       Chester 2015  Other cities after 2010
5    Birmingham 2016  Other cities after 2010
6     Blackpool 2012  Other cities after 2010
7    Birmingham 2005 Other cities before 2010
8       Chester 2009 Other cities before 2010
9     Liverpool 2005    Liverpool before 2010
10         Hull 2011  Other cities after 2010
11        Leeds 2013  Other cities after 2010
12    Liverpool 2014     Liverpool after 2010
13     Bradford 2008 Other cities before 2010
14       London 2010  Other cities after 2010
15     Coventry 2009 Other cities before 2010
16      Cardiff 2016  Other cities after 2010
17    Liverpool 2007    Liverpool before 2010
h3rm4n
  • 4,126
  • 15
  • 21
0

Try this

Region_in_UK = c( "Liverpool", "Manchester", "Liverpool", "Chester", 
"Birmingham", "Blackpool", "Birmingham", "Chester", "Liverpool", "Hull", 
"Leeds", "Liverpool", "Bradford", "London", "Coventry", "Cardiff", "Liverpool")
Year = c(2008, 2010, 2016, 2015, 2016, 2012, 2005, 2009, 2005, 2011, 2013,
2014, 2008, 2010, 2009, 2016, 2007)
df = data.frame(Region_in_UK, Year)

# erase the code above and replace your own dataframe if its bigger 
# than the data you displayed at this point and name it "df" (e.g.: 
# df = your_dataframe)

df$year_city = rep(NA, dim(df)[1])
df = mutate(df, year_city = 
              ifelse (grepl("Liverpool", df$Region_in_UK) &  df$Year < 2010, 
                      "Liverpool before 2010", df$year_city))
df = mutate(df, year_city = 
              ifelse (grepl("Liverpool", df$Region_in_UK) &  df$Year >= 2010, 
                      "Liverpool 2010 and after", df$year_city))
df = mutate(df, year_city = 
              ifelse (!grepl("Liverpool", df$Region_in_UK) &  df$Year < 2010, 
                      "Other before 2010", df$year_city))
df = mutate(df, year_city = 
              ifelse (!grepl("Liverpool", df$Region_in_UK) &  df$Year >= 2010, 
                      "Other 2010 and after", df$year_city))
ecjb
  • 5,169
  • 12
  • 43
  • 79
0

Using base R you could do:

 transform(df, year.city = factor(paste(sub('^((?!Liver).)*$', 'other', Region_in_UK,perl = TRUE), Year>2010), label=1:4))

   Region_in_UK Year year.city
1     Liverpool 2008         1
2    Manchester 2010         3
3     Liverpool 2016         2
4       Chester 2015         4
5    Birmingham 2016         4
6     Blackpool 2012         4
7    Birmingham 2005         3
8       Chester 2009         3
9     Liverpool 2005         1
10         Hull 2011         4
11        Leeds 2013         4
12    Liverpool 2014         2
13     Bradford 2008         3
14       London 2010         3
15     Coventry 2009         3
16      Cardiff 2016         4
17    Liverpool 2007         1

You can also do:

transform(df,m=factor(paste(!grepl("Liverpool",Region_in_UK),Year>2010),label=1:4))

or

transform(df,m = factor(paste(sub('(Liverpool)|.*','\\1',Region_in_UK),Year<=2010),label=4:1))
   Region_in_UK Year m
1     Liverpool 2008 1
2    Manchester 2010 3
3     Liverpool 2016 2
4       Chester 2015 4
5    Birmingham 2016 4
6     Blackpool 2012 4
7    Birmingham 2005 3
8       Chester 2009 3
9     Liverpool 2005 1
10         Hull 2011 4
11        Leeds 2013 4
12    Liverpool 2014 2
13     Bradford 2008 3
14       London 2010 3
15     Coventry 2009 3
16      Cardiff 2016 4
17    Liverpool 2007 1
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • @Eshmel how do you deal with the years 2010? Are they considered as 1 or 2 for liverpool and as 3 or 4 for others? if 1 and 3 then in the above solution use `Year>=2010` else leave it as `Year>2010` – Onyambu Aug 31 '18 at 17:11
  • @Eshmel if this works, then you should accept the answer by clicking on the check sign against the answer in order to close the question – Onyambu Aug 31 '18 at 19:03
  • The only problem is that now I dont see the new column in the data table. It brings it in console, a little chunk of it, but not in the data table. – Eshmel Aug 31 '18 at 20:39