1

I have a dataset with information about the education track of some candidates. For instance, I have a column where people with PhD degrees had to include their field (if applicable), or NA if it did not. Like this

Participants   PHD_Field
A              Economics
B              Sciences
C              NA
D              NA
E              NA

I need to create a column and convert their field into 1 and the N/As into 0. Could you please help me with the code to do this in R?

camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    I don't understand your question - can you please provide more details with the intended input and output? Does this answer your question? https://stackoverflow.com/questions/48649443/how-to-one-hot-encode-several-categorical-variables-in-r – jared_mamrot Aug 11 '21 at 23:45

3 Answers3

1
library(tidyverse)

df1 <- data.frame(
  stringsAsFactors = FALSE,
  Participants = c("A", "B", "C", "D", "E"),
  PHD_Field = c("Economics", "Sciences", NA, NA, NA)
)
df1 %>%
  mutate(phd = as.integer(!is.na(PHD_Field)))
#>   Participants PHD_Field phd
#> 1            A Economics   1
#> 2            B  Sciences   1
#> 3            C      <NA>   0
#> 4            D      <NA>   0
#> 5            E      <NA>   0
crestor
  • 1,388
  • 8
  • 21
0

Here is a base R solution using ifelse:

df$phd <- ifelse(is.na(df$PHD_Field), 0, 1)

Output:

  Participants PHD_Field   phd
  <chr>        <chr>     <dbl>
1 A            Economics     1
2 B            Sciences      1
3 C            NA            0
4 D            NA            0
5 E            NA            0
TarJae
  • 72,363
  • 6
  • 19
  • 66
0
df$phd <- as.integer(!is.na(df$PHD_Field))

Process Breakdown:

is.na(df$PHD_Field) # is.na = is the value NA?

[1] FALSE FALSE TRUE TRUE TRUE

!is.na(df$PHD_Field) # adding a ! (NOT) to reverse the logic; is the value NOT NA

[1] TRUE TRUE FALSE FALSE FALSE

as.integer(!is.na(df$PHD_Field)) # as.integer = turn F into 0, and T into 1

[1] 1 1 0 0 0

df$phd <- as.integer(!is.na(df$PHD_Field)) # assign to the dataframe field phd
df
  Participants PHD_Field phd
1            A Economics   1
2            B  Sciences   1
3            C      <NA>   0
4            D      <NA>   0
5            E      <NA>   0
M.Viking
  • 5,067
  • 4
  • 17
  • 33