0

I am incredibly new to R. and am working on a project where it takes the variables of a column are different countries. however, some are spelled differently, or named differently. forexample some variables for "United states" would be "USA", "Ahem....Amerca", "'merica", "USSA" "USAUSAUSA". I want to be able to rename them simply to "United States" as well as only keep variables that are/resemble "United states", "Canada" and "United Kingdom" and omit/delete the entire row completely.

I have been thinking about using multiple if-statements inside a for-loop or using case_when. but im not entirely sure how to actually write the code to do it.

im looking to look through one column with 2460 rows and look for specific words and order of letters and change it to "United States"

CandyData <- #is the dataframe the column is in

for ( row in 1:length(CandyData))
{
  if (x == "USA"| "Ahem...Amerca"|"merica"|"USSA"|"USAUSAUSA")
{x = "United States" }
else if 
{x.omit }
}

I don't really have any errors because I haven't been able to make it work properly.

Felix Chan
  • 21
  • 4
  • Please update your post with a reproducible example covering all the cases you have with an expected output. Also read [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Ronak Shah Jun 17 '19 at 04:13
  • @RonakShah Hi Im new to stackoverflow so im not entirely sure how to include a small dataset into my question – Felix Chan Jun 17 '19 at 04:23
  • dont use `==` for more tha none thing, use `%in%` – morgan121 Jun 17 '19 at 05:06
  • to include data type `dput(head(data))` and paste the output into your question – morgan121 Jun 17 '19 at 05:06
  • @FelixChan Hi, best is to provide a [MCVE](https://stackoverflow.com/help/minimal-reproducible-example) i.e. a toy example as minimal as possible but still producing your issue while being copy-pastable and immediately runable. – jay.sf Jun 17 '19 at 05:22
  • when i pasted my dput(head(data)) this is what i receive"structure(c("function (..., list = character(), package = NULL, lib.loc = NULL, ", " verbose = getOption(\"verbose\"), envir = .GlobalEnv, overwrite = TRUE) ", "{", " fileExt <- function(x) {", " db <- grepl(\"\\\\.[^.]+\\\\.(gz|bz2|xz)$\", x)", " ans <- sub(\".*\\\\.\", \"\", x)"), .Dim = c(6L, 1L), .Dimnames = list( c("1", "2", "3", "4", "5", "6"), ""), class = "noquote")" – Felix Chan Jun 17 '19 at 07:05

1 Answers1

0

As mentioned before, it is very important to provide a reproducible example. It helps people who may want to help you! Otherwise, it could be time-consuming...

That said, you don't need a loop for doing it (actually, it's not advisable to use loops in R inadvertently, once it usually works better with vectorized alternatives).

Let's assume that the column you're working is called "Country":

# Update wrong values
wrong_names <- c("USA", "Ahem...Amerca", "merica", "USSA", "USAUSAUSA")
CandyData$Country[CandyData$Country %in% wrong_names] <- "United States"

# Select lines for target-countries only
CandyData[CandyData$Country %in% c("United States", "Canada", "United Kingdom"), ]

I hope it helps!
Best

Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35
  • Hi This really Helped thank you so much. However, the last portion doesn't seem to delete the other rows that arent "United States", "Canada", and "United Kingdom" is there a way to do this? – Felix Chan Jun 17 '19 at 08:42