0

I am completely new to R (+ programming, data analysis, ...) and not really getting the hang of it.

I have a dataset with 5 columns. One column is "company names" and one column is "sector" and each company is matched to 1 of 26 sectors. For my regression, I want to only distinguish between "environmentally sensitive" and "non-environmentally sensitive sectors". The variable is a character. Thus, I want to allocate the 26 different sectors to either "env_sensitive_sectors" or "nonenv_sensitive_sectors".

I have gone through forums and YouTube but I can't seem to find the code specific to my problem.

So far I have created a vector, but I don't even know if that is necessary. Should I create a new column so I can use this new column for my regression?

Sectors into environmentally sensitive and non-environmentally sensitive

env_sensitive_sectors <- c("Airlines", "Energy", "GroundandMaritimeTransportation","Healthcare", "Industrials", "Manufacturing", "Mining", "Materials", "TechnologyandTelecommunication")

nonenv_sensitive_sectors <- c("Agriculture", "Consumergoods", "ConsumerGoods", "ConsumerServices", "CosmeticIndustry", "Education", "Fashion", "FinancialServices", "InternationalOrganization", "LawFirms", "LuxuryGoods", "Media","Municipality", "Non-GovernmentalOrganization", "ProfessionalServicesFirms", "PublicSector", "Publicsector")

(I realized too late that e.g. "Public Sector" appears with both a capital S and a lowercase s, so it appears in both forms on purpose)

I feel like there is a very simple solution to this but I cannot find it. Can anyone help?

rnoob
  • 1
  • Does this answer your question? [Filter multiple values on a string column in dplyr](https://stackoverflow.com/questions/25647470/filter-multiple-values-on-a-string-column-in-dplyr) – Paul Stafford Allen Jun 16 '23 at 08:40

1 Answers1

0

Something like this would do the job

data <- data %>%
  mutate(sector = case_when(
    value %in% env_sensitive_sectors ~ "environmentally sensitive",
    value %in% nonenv_sensitive_sectors ~ "non-environmentally sensitive",
    TRUE ~ "Not in any vector"
  ))

using dplyr package

Quixotic22
  • 2,894
  • 1
  • 6
  • 14