0

I have a very simple code, I do not understand why not working the way I want. Basically, I have a data frame and want to capture the value of n'th element of a column in the data frame, and store it in a vector. Here is my code:

COL1_VALUES <- c("ABC","XYZ","PQR")
COL2_VALUES <- c("DEF","JKL","TSM")

means <- data.frame(COL1_VALUES,COL2_VALUES)

for (i in 1:nrow(means)) {
    COL1_VALUES[i] <- means$COL1[i];
    COL2_VALUES[i] <- means$COL2[i];
}

print(means$COL1)
print(COL1_VALUES)

This outputs:

[1] ABC XYZ PQR
Levels: ABC PQR XYZ
[1] "1" "3" "2"

Why not am I not getting ABC XYZ TSM in the vector COL1_VALUES? It appears like 1, 3, 2 are the indices of ABC XYZ TSM in means$COL1. What do I need to get ABC XYZ TSM in the vector COL1_VALUES?

Thanks.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Baykal
  • 569
  • 2
  • 10
  • 15
  • 2
    `COL1_VALUES` as a vector is a `character` vector. When it is put into a `data.frame()` it gets converted to a `factor` (since `data.frame(...,stringsAsFactors=TRUE)` by default). I think this explains your difference. – thelatemail Jun 03 '15 at 23:24
  • 1
    I don't understand why are you doing this in the first place. Your code does practiacally nothing usefull – David Arenburg Jun 03 '15 at 23:25
  • @DavidArenburg this is 100 times simplified version of my actual code to provide an example case here. – Baykal Jun 03 '15 at 23:26
  • 1
    @Baykal your `means$COL1` was converted to a factor. (Factor is basically an integer with a text label). Try: `means <- data.frame(COL1_VALUES,COL2_VALUES, stringsAsFactors=FALSE)` – akhmed Jun 03 '15 at 23:30
  • 1
    Terry Therneau, longtime S/R user and survival package author, says his R shop has mandated `options(stringsAsFactors =FALSE)` as part of their `.Rprofile.site`. – IRTFM Jun 03 '15 at 23:39
  • @akhmed that worked. Thanks a lot. Why don't you convert that to an answer and I can mark it. – Baykal Jun 04 '15 at 00:08
  • I agree -- this question does deserve a full answer especially if it is rephrased into a more general one/easily searchable. (For example, "Why is character vector displayed as integers when added to a data.frame?" or so). `stringsAsFactors=TRUE` can be a huge source of confusion for beginners and affects other functions as well (see my answer). – akhmed Jun 04 '15 at 00:56
  • @Baykal actually that answer was provided by thelatemail in his first comment, not sure what was so special about akhmeds comment. – David Arenburg Jun 04 '15 at 07:55

1 Answers1

1

In R, data.frame() function ships with a default setting of stringsAsFactors=TRUE. This means that all input character vectors are implicitly converted into so called "factors" when creating a data.frame.

factor is somewhat like a vector with integers + a text labels that describe those integers. For example, if column gender has a type factor it is actually a vector of integers with 1s and 2s plus an attached dictionary that category id 1 means Male and category id 2 means Female or vice versa.

This default setting on stringsAsFactors is a sneaky beast and can show up in numerous unexpected locations. In most of these cases, it helps just to add an explicit stringsAsFactors=FALSE option so as to keep character vectors as character vectors.

Below I list the functions that I personally struggled with until realising that all I am missing is stringsAsFactors=FALSE option:

  • data.frame
  • read.csv, read.table and other read.* functions
  • expand.grid

In your specific example above, what you need to do is find this line:

means <- data.frame(COL1_VALUES,COL2_VALUES)

and replace it with:

means <- data.frame(COL1_VALUES,COL2_VALUES,
                     stringsAsFactors=FALSE)

such that you are explicitly requesting data.frame() not to do any implicit conversions behind your back.

You can also avoid this conversion by changing the global option at the beginning of each R session:

options(stringsAsFactors = FALSE)

Note, however, that modifying this global option only affects your machine and snippets of your code may stop working on the machines of others.

This answer contains more information about how to disable it permanently.

akhmed
  • 3,536
  • 2
  • 25
  • 35