Convert one column in data frame into several columns with binary presence/absence values

Question

I am struggling a bit with a dataframe column (from long to wide) conversion where I have more than 300 columns. Like a pivot operation but only for one column where the character of that column should be used to create presence/absence in the new columns..

One of the columns includes more than 1000 unique character strings (different agricultural practices). I would like to convert that column into multiple columns with binary presence/absence values for each of the unique character strings in the original column.

I could only find help online for this kind of operation if writing all the header names of the new columns (wide), based on the character strings in the "long".

(If possible, a tidy version of the code would be very welcomed)

P.S.: Also suggestions to how to frame this question is welcomed.

Thank you!

This Process is called "one hot encoding". Try this link for a few solutions: https://datatricks.co.uk/one-hot-encoding-in-r-three-simple-methods — Sandwichnick, Jun 17 '21 at 08:53
Thank you! Also possible to make the 1 / 0 instead of TRUE / FALSE ? — Kamau Lindhardt, Jun 17 '21 at 10:13

Sandwichnick · Answer 1 · 2021-06-17T10:17:40.330

1

This would be a tidy solution:

library(tidyverse)

# Making mock data
df <- data.frame(id= c(1:1000),mystrings = round(runif(1000, min=0, max=300)))
df$mystrings <- as.character(df$mystrings)


## Calculation starts here
df$value <- TRUE # Add binary


#pivot wider

df <- df %>% pivot_wider(names_from = mystrings,
                         values_from = value,
                         values_fill = FALSE) # fills NA with FALSE

using a factor solution:

## Calculation starts here
df$value <- factor(x= 1, levels = c(0,1)) # the factor
# you can enter the 1 here also numerical ( just " <-1")

#pivot wider

df <- df %>% pivot_wider(names_from = mystrings,
                         values_from = value,
                         values_fill = as.character(0)) # enter the zeroe value here

edited Jun 17 '21 at 10:17

answered Jun 17 '21 at 09:10

Sandwichnick

1,379
6
13

Hi @sandwichnick Thank you so much! It worked perfectly. Could you here at last provide a suggestion to how I could then convert the TRUE/FALSE into 1/0 factor values across all the columns generated. Best regards, – Kamau Lindhardt Jun 17 '21 at 09:34
1

I edited my answer, although if you do need a binary solution, i would recommend to kkep it with TRUE and FALSE, as it demands less memory space. you can calculate `sum()` or `mean()` with logical vectors. R will think of TRUE as 1 and FALSE as 0. – Sandwichnick Jun 17 '21 at 10:19

score 0 · Answer 2 · answered Jun 17 '21 at 09:26

Thank you ☝

Hope it will not exhaust my memory. I also found this answer (see below), however, it's exhausting my vector memory. Yet, that (below) is exactly what I am looking for, except my dataframe has 107800 rows/observations and 350 columns.

library(tidyr)
studentInfo <- data.frame(
  StudentID = c(1,1,1,2,3,3),
  Subject = c("Maths", "Science", "English", "Maths", "History", "History"))

pivot_wider(studentInfo,
            names_from = "Subject", 
            values_from = 'Subject', 
            values_fill = 0,
            values_fn = function(x) 1)
#> # A tibble: 3 x 5
#>   StudentID Maths Science English History
#>       <dbl> <int>   <int>   <int>   <int>
#> 1         1     1       1       1       0
#> 2         2     1       0       0       0
#> 3         3     0       0       0       1

From here Reshape from long to wide and create columns with binary value

Convert one column in data frame into several columns with binary presence/absence values

2 Answers2