0

stataData = the orginal data

country
"246"
"246"
"246"
"752"
"752"
"752"
"643"
"643"
"643"
"840"
"840"
"840"

my goal is to add a new column which get a value of 1 if country is 246 or 752; value 2 if country is 643 or 840. goal stataData looks like -->

country area
"246" 1
"246" 1
"246" 1
"752" 1
"752" 1
"752" 1
"643" 2
"643" 2
"643" 2
"840" 2
"840" 2
"840" 2

In this case countries is a list of two data frames: (In real case there can be more data frames in the list and there will be more countries)

  countries <- list(data.frame(c("FIN", "SWE"), c("246", "752")), data.frame(c("RUS", "USA"), c("643", "840")
  for(i in length(countries)){
    stataData$alue <- ifelse(stataData$country %in% countries[[i]][,2], i, stataData$country )
  }

However this code returns a data frame (stataData) where there is only number for the second set of countries, like this:

country area
"246" "246"
"246" "246"
"246" "246"
"752" "752"
"752" "752"
"752" "752"
"643" 2
"643" 2
"643" 2
"840" 2
"840" 2
"840" 2

I have tried using is_empty() but haven't find the way to eliminate the problem. So I'm asking how to solve the problem. In the real case in the stataData there is over 1.5milj observations.

edit. Clarifying the problem.

lets say that I have

ifelse(df$x1 %in% delta, i, NA) 

where delta is an element of a list and i is a number. Is there way to increase the i when delta goes forward.

Cheers!

jakeRP
  • 1
  • 1
  • ifelse is vectorised, no need for forloops. Something like: `stataData$area <- ifelse(stataData$country %in% c("246", "752"), 1, ifelse(stataData$country %in% c("643", "840"), 2, NA))` – zx8754 Sep 06 '21 at 09:53
  • Possible duplicate https://stackoverflow.com/q/18012222/680068 – zx8754 Sep 06 '21 at 09:59
  • Related https://stackoverflow.com/q/35636315/680068 – zx8754 Sep 06 '21 at 09:59
  • Okey probably I wasn't clear enough. The problem is when the number of groups are unknow, the one who uses the function might want to make 4 country groups but don't want touch the underlying function. So what I tried to ask was, is there anyway to automaticly change the yes part of ifelse(). Lets say that I have ifelse(df$x1==delta, i, NA), where delta and i changes, e.g. delta is a element of the list and i is a number, when delta cahnges, i increases. Hopefully this clarifies a bit. – jakeRP Sep 06 '21 at 10:18

1 Answers1

1

You can create a lookup table and then perform a merge

lookup <- data.frame(country = c(246, 752, 643, 840), area = c(1, 1, 2, 2))
result <- merge(stataData, lookup, by = 'country')

#   country area
#1      246    1
#2      246    1
#3      246    1
#4      643    2
#5      643    2
#6      643    2
#7      752    1
#8      752    1
#9      752    1
#10     840    2
#11     840    2
#12     840    2

data

It is easier to help if you provide data in a reproducible format

stataData <- data.frame(country = rep(c(246, 752, 643, 840), each = 3))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213