0

I'm trying to convert a categorical variable into a factor in R after using the melt() function to convert from wide to long format. However, when I run the factor function and input levels and labels, I get a table of :

Does anyone know why this is happening?

law <- read.csv("lawyers_class_new.csv")


library(reshape2)
law <- melt(law, id.vars = c("Subj"), measure.vars = c("lawyer", "neutral", "engineer", "neutral_urb", "neutral_rur"))
law <- law[order(law$Subj),]
law <- within(law,
              Subj <- factor(Subj),
              variable <- factor(variable)
              )
law$variable<- ordered(law$variable,levels=c(1,2,3,4,5),labels=c("lawyer","neutral",
    "engineer","neutral_urb","neutral_rur"))


Output: 

law$variable
  [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>     <NA> <NA> <NA> <NA>
 [18] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
 [35] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
 [52] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
 [69] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
 [86] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[103] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[120] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[137] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>

MELTED DATA FRAME:

**Subj  Cond    variable    value**
1         2       lawyer      3
1         3      neutral      1
1         1      engineer     3.5
1         5      neutral_urb  3
1         4      neutral_rur  3.5
2         2      lawyer       1
2         3      neutral      3.5
2         1      engineer     4.5
2         5      neutral_urb  2
2         4      neutral_rur  3.5

ORIGINAL DATA FRAME:

Subj    lawyer  neutral engineer    neutral_urb neutral_rur
1          3       1      3.5           3          3.5
2          1     3.5      4.5           2          3.5
  • 1
    Please make a reproducible example. We do not have access to lawyers_class_new.csv. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Pierre Lapointe Mar 12 '17 at 18:44
  • 1
    It seems like the levels aren't `1:5` at the second conversion to ordered factor. The levels argument should be *what the factor levels appear as*, and the labels are optional only if you want to change them to something else. – Gregor Thomas Mar 12 '17 at 18:47
  • Also, I don't know your goals, but many people mistakenly think that an ordered factor is necessary for having levels in a specific order, e.g., for plotting. That is not the case. The only reason for an `ordered` factor is for the contrasts used when modeling. – Gregor Thomas Mar 12 '17 at 18:49
  • @ P Lapointe: Sorry, but I don't know how I would do that? I don't have access to a server to post the data set. – DartmouthDude82 Mar 12 '17 at 18:49
  • @Gregor: I'm trying to convert the levels to numeric values. I don't want non-numeric values. After I figure this out, I will be doing a linear mixed model on these data and that requires that I have contrasts. When I try to run contrasts on "variable" as is, I get an error message. So my thought was that "variable" needed to be converted to a numeric factor in order to do my analysis. Is this an incorrect assumption? – DartmouthDude82 Mar 12 '17 at 18:53
  • @DartmouthDude82 It's best to put a few lines of data. Using 'dput' is efficient. – Pierre Lapointe Mar 12 '17 at 18:56
  • @P Lapointe: Done! – DartmouthDude82 Mar 12 '17 at 18:59
  • @P Lapointe: I added snippets from the original and melted data frames: Does that help? – DartmouthDude82 Mar 12 '17 at 19:05
  • So, from my first comment, the `levels` argument needs to be the factor levels *as they are* and the `labels` argument should be *what you want them to be*. So if they are text, and you want them numeric, the `levels` argument should be the text and the `labels` argument should be `1:5`. – Gregor Thomas Mar 12 '17 at 19:31
  • In all cases you will have contrasts. The contrasts for a non-ordered factor will be 2v1, 3v1, 4v1, ... . For an ordered factor the contrasts will be orthogonal polynomials. – Gregor Thomas Mar 12 '17 at 19:33
  • @Gregor: Thank you! I took your advice, but now my contrasts are not working. I'm getting the following error: Error in get(ctr, mode = "function", envir = parent.frame()) : object 'cont_1' of mode 'function' was not found I know that I set my contrasts correctly, as I've done it a thousand times. Something weird is going on. – DartmouthDude82 Mar 12 '17 at 20:06
  • Well, probably the problem is in some code you haven't shown. – Gregor Thomas Mar 12 '17 at 20:32

1 Answers1

0

To minimize errors, I wouldn't import character columns as factors and it seems that using within does not create factors properly for law$variable. Consequently, I would specify factors like this to ensure the correct order.

law  <- read.table(text="Subj  Cond    variable    value
1         2       lawyer      3
1         3      neutral      1
1         1      engineer     3.5
1         5      neutral_urb  3
1         4      neutral_rur  3.5
2         2      lawyer       1
2         3      neutral      3.5
2         1      engineer     4.5
2         5      neutral_urb  2
2         4      neutral_rur  3.5", header=TRUE, stringsAsFactors=FALSE)

law <- law[order(law$Subj),]

law$Subj <- as.factor(law$Subj)
law$variable <- factor(law$variable,levels =c("lawyer","neutral",
    "engineer","neutral_urb","neutral_rur"))

str(law)
'data.frame':   10 obs. of  4 variables:
 $ Subj    : Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
 $ Cond    : int  2 3 1 5 4 2 3 1 5 4
 $ variable: Factor w/ 5 levels "lawyer","neutral",..: 1 2 3 4 5 1 2 3 4 5
 $ value   : num  3 1 3.5 3 3.5 1 3.5 4.5 2 3.5
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56