3
> test <- data.frame()
> test<-rbind(test,c("hi","i","am","bob"))
> test<-rbind(test,c("hi","i","am","alice"))
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "alice") :
  invalid factor level, NAs generated

Why does this minimal example produce that error? I want to append several string-rows to an empty data frame.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
user3182532
  • 1,097
  • 5
  • 22
  • 37
  • 1
    Please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. – Paul Hiemstra Jan 12 '14 at 12:04
  • You are doing something you shouldn't do and shouldn't need to do, i.e., you should never create an empty object and grow it in a loop. I also don't understand why you use `sprintf` for this. Make your input numeric and use `round` or `signif` if you must. – Roland Jan 12 '14 at 12:10
  • Hi Paul, I have just come to notice that I can basically reduce my question to: why does data4plotting<-data.frame() plus data4plotting<-rbind(data4plotting, c("hi","I","am","Bob")) result in a dataframe with factors ? I want it to result in a data.frame of strings! – user3182532 Jan 12 '14 at 12:14
  • Hi Roland, but I have to do big calculations and get the summary of several of those calculations in a data frame. How can I achieve this if not this way? I use sprintf because I need to format a value. I cannot make the input numeric, because the first 2 columns must be strings. So it shouldn't matter if the 3rd and 4th columns are also strings (by using sprintf), or am I misunderstanding? – user3182532 Jan 12 '14 at 12:18
  • Please produce a reproducible example (follow the link in the first comment to learn how), so we can show you better ways to achieve your desired result. – Roland Jan 12 '14 at 12:25
  • Okay now I have tracked down the actual problem, I will edit my opening post to the actual underlying question! – user3182532 Jan 12 '14 at 12:31
  • 3
    You are looking for `options(stringsAsFactors=FALSE)`. However, I can only strongly reiterate that your whole approach is the least efficient you could use to get your final result. State your whole problem, so people can show you better possibilities. – Roland Jan 12 '14 at 12:37
  • Hi Roland, that doesn't solve the problem or am I applying it wrongly?I have inserted that global option before the rows of that example above.The whole problem is:I have a set of programs, namely:bowtie,bwa,masai and razers.I also have a set of datasets:chr1,chr1_s,chrMT and helico.Now every program gives me output for every dataset.And I need to process these outputs in the for-loop to obtain 2 values for each combination of program/dataset.So at the end I want to save 2 values for each combination. In the resulting dataframe I want to see: V1=program,V2=dataset,V3=value1 and V4=value2. – user3182532 Jan 12 '14 at 12:49

2 Answers2

9

You can store your information in a character matrix. Of course, you can convert this matrix into a data frame using as.data.frame and the argument stringsAsFactors = FALSE.

> test <- matrix(c("hi","i","am","bob"), nrow = 1)
> test <- rbind(test, c("hi","i","am","alice"))
> test
     [,1] [,2] [,3] [,4]   
[1,] "hi" "i"  "am" "bob"  
[2,] "hi" "i"  "am" "alice"

> testDF <- as.data.frame(test, stringsAsFactors = FALSE)
> testDF <- rbind(testDF, c("hi","i","am","happy"))
> testDF
  V1 V2 V3    V4
1 hi  i am   bob
2 hi  i am alice
3 hi  i am happy
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • As of R 4.0.0 data.frame() default has changed to stringAsFactors = FALSE, so it no longer has to be explicitly changed. The original problem no longer occurs – Rainfall.NZ Jun 05 '21 at 21:29
5

Problem is that R, by default, understands characters as factors. In order to avoid this behaviour:

options(stringsAsFactors = FALSE)
test <- data.frame()
test<-rbind(test,c("hi","i","am","bob"))
test<-rbind(test,c("hi","i","am","alice"))
Andrea
  • 593
  • 2
  • 8