0

I would like to create a dummy if an action happens in a capital city and my dataset contains 34 countries in it. Also, some times can happen that the word is within a larger string (e.g. "Berlin, Germany, DE").

Let's say the column looks as follows:

      Location
1    Manchester
2    Berlin
3    Paris, France
4    Kansas

I would like the Dummy to produce the following output:

      Location          Capital_Dummy
1    Manchester               0
2    Berlin                   1
3    Paris, France            1
4    Kansas                   0

Any idea about how I could do that?

I have tried the following, which I hoped that would at least work for the cases in which only the name of the capital appears in the column but had no success even with that (making it shorter for the sake of simplicity):

capital <- c(“Madrid”, “Berlin”, “Paris”, “Prague”, “Bratislava”)

capital_dummy[df$event_location == capital] <- 1

The solution to the question, proposed by David Arenburg:

capital <- c("Madrid", "Berlin", "Paris", "Prague", "Bratislava")

capital_dummy <- grepl(paste(capital, collapse = "|"), df$Location) + 0L
Spl4t
  • 53
  • 8

1 Answers1

0

Assuming you have unstructured text in the variable location, you could use grepl to pattern-match your capitals

df <- data.frame(location = c("Manchester", "Berlin", 
                              "Paris, France", "Kansas"))

capital <- c("Madrid", "Berlin", "Paris", "Prague", "Bratislava")

capital_dummy_matrix <- sapply(
    X = paste0("*", capital, "*"), # Pattern for your capitals
    FUN = grepl,
    x = df$location, 
    ignore.case = TRUE)

df$capital_dummy <- apply(
    X = capital_dummy_matrix, 
    MARGIN = 1, 
    FUN = max)

df

> df
       location capital_dummy
1    Manchester             0
2        Berlin             1
3 Paris, France             1
4        Kansas             0

This produces your desired output. But there might be a simpler solution, if you provide more information about the structure of your data.

elikesprogramming
  • 2,506
  • 2
  • 19
  • 37