3

I have some data that I'm working with from a Udacity course (Link: Reddit Survey Responses). I'm trying to simplify the Employment Status variable by replacing any multi-word values with single word alternates using

RS$employment.status <- ifelse(RS$employment.status == "Not employed,  but looking for work",
                               "Unemployed", RS$employment.status)

However, when I run the code any values that aren't supposed to be replaced are replaced with numeric values. Given that the else case is to use the field's value, I'm not sure why the text isn't preserved as-is.

Here's a screenshot of the initial data enter image description here

And the after enter image description here

So if anyone could point out

  1. why the substitution is being made when it doesn't look like it should be;
  2. what would be the correct way to accomplish what I'm trying to achieve;

it would be much appreciated.

lmo
  • 37,904
  • 9
  • 56
  • 69
JMichael
  • 569
  • 11
  • 28
  • Please read [about how to provide a minimum, working, reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Aug 11 '15 at 20:40
  • 4
    what is the type of the column? You are replacing your data with a character type but you may have a factor as your current column type – Michal Aug 11 '15 at 20:40
  • I couldn't easily find a duplicate for this problem, which is one that might be tricky for beginners to diagnose. This question will likely be of use to future searchers, though it would be better with a minimum example of your data, code, & results. – Sam Firke Aug 11 '15 at 20:47
  • [This](http://stackoverflow.com/questions/15912210/replace-a-list-of-values-by-another-in-r/15912309#15912309) is a different approach to a number of `ifelse()` statements; you'll want to make sure you coerce the original column to character() before doing the map. – Martin Morgan Aug 11 '15 at 20:52

1 Answers1

7

The problem is that this variable is set as a Factor, so to fix your problem you can either add this argument when you read your data stringsAsFactors = FALSE or you could do this:

  RS$employment.status <- ifelse(RS$employment.status == "Not employed, but looking for work", 
"Unemployed", as.character(RS$employment.status))
Andrelrms
  • 819
  • 9
  • 13
  • Thanks! After Michal's comment I checked and they are being imported as factors. For future reference is there a way to force R to default to NEVER using factors (for strings, ints, floats/doubles, etc.)? – JMichael Aug 11 '15 at 20:46
  • 1
    @JMichael you can add options("stringsAsFactors"=FALSE) at the beginning of you code every time, I don't know if there is a way to do it for every R session – Andrelrms Aug 11 '15 at 20:51
  • @JMichael Yes, you can put that in your .Rprofile. However, be aware that writing code under those circumstances will make it very dangerous to share your code with others. There will likely be strange bugs because they haven't set that option and it will be very difficult to diagnose. – joran Aug 11 '15 at 21:58
  • 1
    Another option is to read in your data using functions from the [`readr` package](http://blog.rstudio.org/2015/04/09/readr-0-1-0/), which by default do not convert to strings to factors (and, as a bonus, `readr` is much faster than base functions for reading data). – eipi10 Aug 11 '15 at 23:33
  • @eipi10 Thanks! Figured there was something like that out there. Just hadn't gotten that far yet. I'll be installing that now. – JMichael Aug 13 '15 at 17:21