13

Short version:

I do not understand the behaviour of as.character when trying to convert a single row of a data frame to a character vector.

> mydf <- data.frame("myvar1"=c("mystring","2"),"myvar2"=c("mystring","3"))
> mydf # nice!
myvar1   myvar2
1 mystring mystring
2        2        3
> as.character(mydf[1,])
[1] "2" "2"
> as.character(as.vector(mydf[1,]) ) 
[1] "2" "2"

Maybe somebody could give me an explanation for the last 2 output lines and the correct approach? Thanks a lot.

Background/Purpose:

I want to use lre() in order to detect consecutive occurrences of values in a row of a data frame (with columns of different data types).

Problem: lre() requires a vector, vectors require a definite data type (integer, character, factor, ...). My idea here is to turn the data frame row into a character vector to avoid data loss through conversion.

nilsole
  • 1,663
  • 2
  • 12
  • 28

2 Answers2

11

Your data frame columns aren't characters they are factors.

When you create a data frame the default is that characters are factors. You can see this clearly if you select a column

R> mydf[,1]
[1] mystring 2       
Levels: 2 mystring

To avoid this behaviour set the stringsAsFactors argument to FALSE

mydf = data.frame("myvar1"=c("mystring", "2"),
                    "myvar2"=c("mystring", "3"), 
                     stringsAsFactors=FALSE)

You should also look at this question: How to convert a data frame column to numeric type?

Community
  • 1
  • 1
csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • Thank you! I guess have to take some lessons in data types. (y) – nilsole Jun 30 '14 at 14:40
  • btw cs + @coffeinjunky : Do you think it is a good idea to turn data (integers, floats, ...) into characters in order to avoid data loss through conversion? or does R offer better ways? thx :) – nilsole Jun 30 '14 at 14:44
  • R is used by professional statisticians. I don't think you have to worry about data loss - if you do you are probably doing it wrong ;) – csgillespie Jun 30 '14 at 14:48
  • 1
    I don't really see any advantages but many disadvantages of converting your data to characters, i.e. I would not advise doing so. – coffeinjunky Jun 30 '14 at 14:52
  • ok, the conversion is for the purpose of using it for `rle()` only, in order to assimilate the rows and to detect identical values. Ideas on better ways? :) – nilsole Jun 30 '14 at 14:58
5

Try this:

 mydf <- data.frame("myvar1"=c("mystring","2"),"myvar2"=c("mystring","3"), stringsAsFactors=F)
 as.character(mydf[1,])
 [1] "mystring" "mystring"

Your strings have been coerced into factors, and you have been shown the factor levels.

coffeinjunky
  • 11,254
  • 39
  • 57