Converting two column vectors of a data frame into a single numeric column

Question

Consider the following toy data frame of my seed study:

site <- c(LETTERS[1:12])          
site1 <- rep(site,each=80)

fate <- c('germinated', 'viable', 'dead')
fate1 <- rep(fate,each=320)

number <- c(41:1000)

df <- data.frame(site1,fate1,number)

> str(df)
'data.frame':   960 obs. of  3 variables:
 $ site1 : Factor w/ 12 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ fate1 : Factor w/ 3 levels "dead","germinated",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ number: int  41 42 43 44 45 46 47 48 49 50 ...

I want R to go through all observations which are "dead" and assign "0" to every single one of them. Similarly, I want to assign "1" to all "viable" observations and "2" to all "germinated" observations.

My final data frame would be a single column, somewhat like this:

> year16
  [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0
 [38] 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1

All suggestions are highly welcome

Also `match(df$fate1, c("dead", "viable", "germinated")) - 1` — David Arenburg, May 10 '18 at 10:30
Possible duplicate, related post: https://stackoverflow.com/questions/7547597/dictionary-style-replace-multiple-items — zx8754, May 10 '18 at 14:32

score 3 · Accepted Answer · answered May 10 '18 at 10:25

3

As zx8754 mentioned, you can have a look at the properties of a factor.

year16 <- as.numeric(factor(df$fate1, levels = c("dead", "viable", "germinated")))-1

Here first I reorder the levels of df$fate1, so dead is assigned to 1, viable to 2 and germinated to 3. You want to start the sequence at 0, so I have to substract 1 after turning the factor in a numeric variable.

answered May 10 '18 at 10:25

kath

7,624
17
32

This idea is clever, and I like it +1, but I wonder about the performance implication of rehashing every entry in the factor. – Tim Biegeleisen May 10 '18 at 10:30
Thank you @Kath. It works the same way Tim Biegeleisen suggested. Kindly read my comment under Tim's answer for one more confusion. – Muneer May 10 '18 at 10:40
@TimBiegeleisen Yes you're right! I actually like your solution much better but I wanted to provide an base R approach. – kath May 10 '18 at 10:40
2

@TimBiegeleisen I've tested on a 15MM data set and the verse (your) solution is slower by factor of 6 – David Arenburg May 10 '18 at 10:57
@ Kath. There some problem: It gives me 0, 1 and 2 twelve times each one, which I do not want. However, I want R to repeat 0, 1 or 2 the number of times it occurs in column "number". Is it possible? – Muneer May 10 '18 at 11:04
1

@Muneer does `rep(year16, df$number)` give the result you're looking for?? – kath May 10 '18 at 11:15
@Kath. Yes, it has solved my problem. Exactly the same way I wanted. Thank you so much. – Muneer May 10 '18 at 11:21

Tim Biegeleisen · Answer 2 · 2018-05-10T10:46:47.733

2

Using case_when from the dplyr library:

df$year16 <-
case_when(
    levels(df$fate1)[df$fate1] == "dead" ~ 0,
    levels(df$fate1)[df$fate1] == "viable" ~ 1,
    levels(df$fate1)[df$fate1] == "germinated" ~ 2,
    TRUE ~ -1
)

Note: The solutions given by @David and @kath are much more graceful than this, but what I gave above would still work even if we had non numerical replacements.

edited May 10 '18 at 10:46

answered May 10 '18 at 10:20

Tim Biegeleisen

502,043
27
286
360

@ Tim Biegeleisen, Thank you very much. It solved my problem. But, there is one more issue: e.g. if in site A there are 40 "germinated" seeds, it just assigns 1 to all 40. However, I would like to repeat 1 fourty times. Is it possible? – Muneer May 10 '18 at 10:37
1

What is the difference between assigning 1 40 times and repeating 1 40 times? – Tim Biegeleisen May 10 '18 at 10:43
Literally, there is no difference. It might have some effect once I run model. – Muneer May 10 '18 at 10:47
The problem is both solutions give me 0, 1 and 2 twelve times each one, which I do not want. – Muneer May 10 '18 at 11:00

score 0 · Answer 3 · answered May 10 '18 at 10:25

0

Base R solution:

assignnum <- function(x) {

  if (x == 'viable') {
    z <- 1
} else if (x == 'dead') {
  z <- 0
} else if (x == 'germinated') {
  z <- 2  
}
  return(z)
}

df['result'] <- sapply(df$fate1, assignnum)

answered May 10 '18 at 10:25

Ollie Perkins

333
1
12

Converting two column vectors of a data frame into a single numeric column

3 Answers3