0

I have a lot of data to work on, and to make things more efficient I would like to come up with a code that will allow me to assign a regional code to an article per the country of origin of its author.

In other words, I have the following:

country$author_country

MEX

COL

TUN

GBR

USA

BRA

etc. 

I have created a column 'author_region' filled with NAs. I want to assign a region code to everyone of the author_country values. Instead of doing it by hand, for instance something like if(country$author_country == MEX){country$author_region == 1},

I was hoping there is a way to create an object that would allow me to list all the countries from a region, and then assign a value to my author_region column based on whether or not author_country matches the content of this object. I thought about doing it like this:

LatAm <- list('COL', 'MEX', 'BRA')
for (i in country$author_country) if (country$author_country == LatAm)
{country$author_region[i] <- 1}

I know this looks wrong and it obviously does not work, but I couldn't find a solution to this issue. Could you help me please?

Thank you!!

  • Your proposed code isn't too far off: `country$author_region <- ifelse(country$author_country %in% LatAm, 1, country$author_region)` – dario Mar 10 '20 at 16:17
  • Just a suggestion: If you add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) you could make it easier for others to find and test a answer to your question. That way you can help others to help you! – dario Mar 10 '20 at 16:18
  • Do you already have data that would put a country in a region? If so, you could use one of `dplyr`'s `join` functions. – markhogue Mar 10 '20 at 16:18

1 Answers1

0

A WORKAROUND:

There is a workaround:

country$author_region = unclass(as.factor(country$author_country)) + 1

This solution assumes you want a one-line workaround and don't care which country gets what code number. Basically the operation above is doing:

  1. Filling the author_region with exactly author_country.
  2. Converting author_region into a factor.
  3. Unclassing the factor. Unclassing changes a factor vector to an integer vector encoding each factor.
  4. Adding 1 to the result, since unclass result starts from integer 0.

IF A DATAFRAME THAT TELLS US THE CODE OF EACH COUNTRY IS AVAILABLE:

Let's say you have a dataframe country_codes with columns author_country specifying the country and author_region specifying the code you intend to use, then you can use join:

library(tidyverse)

author %>%
  left_join(country_codes)

This is the better solution since you can assign specific codes to specific country as you wish.

Nuclear03020704
  • 549
  • 9
  • 22