90

I am just starting with R and encountered a strange behaviour: when inserting the first row in an empty data frame, the original column names get lost.

example:

a<-data.frame(one = numeric(0), two = numeric(0))
a
#[1] one two
#<0 rows> (or 0-length row.names)
names(a)
#[1] "one" "two"
a<-rbind(a, c(5,6))
a
#  X5 X6
#1  5  6
names(a)
#[1] "X5" "X6"

As you can see, the column names one and two were replaced by X5 and X6.

Could somebody please tell me why this happens and is there a right way to do this without losing column names?

A shotgun solution would be to save the names in an auxiliary vector and then add them back when finished working on the data frame.

Thanks

Context:

I created a function which gathers some data and adds them as a new row to a data frame received as a parameter. I create the data frame, iterate through my data sources, passing the data.frame to each function call to be filled up with its results.

user2100721
  • 3,557
  • 2
  • 20
  • 29
cdmihai
  • 3,008
  • 2
  • 20
  • 18

10 Answers10

44

The rbind help pages specifies that :

For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’) are ignored unless the result would have zero rows (columns), for S compatibility. (Zero-extent matrices do not occur in S3 and are not ignored in R.)

So, in fact, a is ignored in your rbind instruction. Not totally ignored, it seems, because as it is a data frame the rbind function is called as rbind.data.frame :

rbind.data.frame(c(5,6))
#  X5 X6
#1  5  6

Maybe one way to insert the row could be :

a[nrow(a)+1,] <- c(5,6)
a
#  one two
#1   5   6

But there may be a better way to do it depending on your code.

user2100721
  • 3,557
  • 2
  • 20
  • 29
juba
  • 47,631
  • 14
  • 113
  • 118
  • 3
    In case you have different data type (`character` and `numeric` for example) it is a better idea to use the `list` function `list("five",6)`. Or it will understand everything as character. – Untitpoi Sep 10 '19 at 13:50
17

was almost surrendering to this issue.

1) create data frame with stringsAsFactor set to FALSE or you run straight into the next issue

2) don't use rbind - no idea why on earth it is messing up the column names. simply do it this way:

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df <- data.frame(a = character(0), b=character(0), c=numeric(0))

df[nrow(df)+1,] <- c("d","gsgsgd",4)

#Warnmeldungen:
#1: In `[<-.factor`(`*tmp*`, iseq, value = "d") :
#  invalid factor level, NAs generated
#2: In `[<-.factor`(`*tmp*`, iseq, value = "gsgsgd") :
#  invalid factor level, NAs generated

df <- data.frame(a = character(0), b=character(0), c=numeric(0), stringsAsFactors=F)

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df
#  a      b c
#1 d gsgsgd 4
user2100721
  • 3,557
  • 2
  • 20
  • 29
Raffael
  • 19,547
  • 15
  • 82
  • 160
  • Be aware that with that method the `c` column is not numeric anymore! str(df) says it is character. – Untitpoi Sep 10 '19 at 13:39
9

Workaround would be:

a <- rbind(a, data.frame(one = 5, two = 6))

?rbind states that merging objects demands matching names:

It then takes the classes of the columns from the first data frame, and matches columns by name (rather than by position)

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
9

FWIW, an alternative design might have your functions building vectors for the two columns, instead of rbinding to a data frame:

ones <- c()
twos <- c()

Modify the vectors in your functions:

ones <- append(ones, 5)
twos <- append(twos, 6)

Repeat as needed, then create your data.frame in one go:

a <- data.frame(one=ones, two=twos)
David
  • 316
  • 1
  • 4
  • 1
    incredibly helpful. perhaps not as succinct, but the data flow is a little less black-boxy. – Andrew Jun 13 '12 at 19:49
  • Indeed a nice answer. But it seems very "not R". When constructing the data.frame you first need to *loop* over all the contents while row operators are workhorses of R. Maybe using the answer by @juba but set the colnames at the end: `colnames(a) <- c("one","two")`? – user989762 Aug 04 '15 at 03:54
  • The problem with this approach is, that you often require the colnames to do the extension of the data frame. Why are so simple things so complicated in r...? – TMOTTM Aug 05 '15 at 14:22
2

One way to make this work generically and with the least amount of re-typing the column names is the following. This method doesn't require hacking the NA or 0.

rs <- data.frame(i=numeric(), square=numeric(), cube=numeric())
for (i in 1:4) {
    calc <- c(i, i^2, i^3)
    # append calc to rs
    names(calc) <- names(rs)
    rs <- rbind(rs, as.list(calc))
}

rs will have the correct names

> rs
    i square cube
1   1      1    1
2   2      4    8
3   3      9   27
4   4     16   64
> 

Another way to do this more cleanly is to use data.table:

> df <- data.frame(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are messed up
>   X1 X2
> 1  1  2

> df <- data.table(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are preserved
   a b
1: 1 2

Notice that a data.table is also a data.frame.

> class(df)
"data.table" "data.frame"
Steve Lihn
  • 357
  • 3
  • 5
1

You can do this:

give one row to the initial data frame

 df=data.frame(matrix(nrow=1,ncol=length(newrow))

add your new row and take out the NAS

newdf=na.omit(rbind(newrow,df))

but watch out that your newrow does not have NAs or it will be erased too.

Cheers Agus

Agus camacho
  • 868
  • 2
  • 9
  • 24
1

I use the following solution to add a row to an empty data frame:

d_dataset <- 
  data.frame(
    variable = character(),
    before = numeric(),
    after = numeric(),
    stringsAsFactors = FALSE)

d_dataset <- 
  rbind(
    d_dataset,
      data.frame(
        variable = "test",
        before = 9,
        after = 12,
        stringsAsFactors = FALSE))  

print(d_dataset)

variable before after  
1     test      9    12

HTH.

Kind regards

Georg

Georg
  • 11
  • 3
0

Instead of constructing the data.frame with numeric(0) I use as.numeric(0).

a<-data.frame(one=as.numeric(0), two=as.numeric(0))

This creates an extra initial row

a
#    one two
#1   0   0

Bind the additional rows

a<-rbind(a,c(5,6))
a
#    one two
#1   0   0
#2   5   6

Then use negative indexing to remove the first (bogus) row

a<-a[-1,]
a

#    one two
#2   5   6

Note: it messes up the index (far left). I haven't figured out how to prevent that (anyone else?), but most of the time it probably doesn't matter.

user2100721
  • 3,557
  • 2
  • 20
  • 29
Daniel
  • 3,243
  • 2
  • 32
  • 31
0

Researching this venerable R annoyance brought me to this page. I wanted to add a bit more explanation to Georg's excellent answer (https://stackoverflow.com/a/41609844/2757825), which not only solves the problem raised by the OP (losing field names) but also prevents the unwanted conversion of all fields to factors. For me, those two problems go together. I wanted a solution in base R that doesn't involve writing extra code but preserves the two distinct operations: define the data frame, append the row(s)--which is what Georg's answer provides.

The first two examples below illustrate the problems and the third and fourth show Georg's solution.

Example 1: Append the new row as vector with rbind

  • Result: loses column names AND coverts all variables to factors
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    c("Bob", 250) 
    )
    
my.df
  X.Bob. X.250.
1    Bob    250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ X.Bob.: Factor w/ 1 level "Bob": 1
 $ X.250.: Factor w/ 1 level "250": 1

Example 2: Append the new row as a data frame inside rbind

  • Result: keeps column names but still converts character variables to factors.
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(name="Bob", score=250) 
    )
    
my.df
      name score
1 Bob  250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ name : Factor w/ 1 level "Bob": 1
 $ score: num 250

Example 3: Append the new row inside rbind as a data frame, with stringsAsFactors=FALSE

  • Result: problem solved.
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(name="Bob", score=250, stringsAsFactors=FALSE) 
    )
    
my.df
      name score
1 Bob  250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ name : chr "Bob"
 $ score: num 250

Example 4: Like example 3, but adding multiple rows at once.

my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(
        name=c("Bob", "Carol", "Ted"), 
        score=c(250, 124, 95), 
        stringsAsFactors=FALSE) 
    )

str(my.df)
'data.frame':   3 obs. of  2 variables:
 $ name : chr  "Bob" "Carol" "Ted"
 $ score: num  250 124 95

my.df
   name score
1   Bob   250
2 Carol   124
3   Ted    95

0

You can use add_row from the tibble package:

tibble::add_row(a, one = c(5, 10), two = c(6, 8))

Output

  one two
1   5   6
2  10   8
LMc
  • 12,577
  • 3
  • 31
  • 43