0

I have a data frame that looks like this:

+-----------+------------+-----------+-----+----------------+
| Unique ID | First Name | Last Name | Age | Characteristic |
+-----------+------------+-----------+-----+----------------+
|         1 | Bob        | Smith     |  25 | Intelligent    |
|         1 | Bob        | Smith     |  25 | Funny          |
|         1 | Bob        | Smith     |  25 | Short          |
|         2 | Jim        | Murphy    |  62 | Tall           |
|         2 | Jim        | Murphy    |  62 | Funny          |
|         3 | Kelly      | Green     |  33 | Tall           |
+-----------+------------+-----------+-----+----------------+

I want to convert the "Characteristic" column into a row header, and for the present of that characteristic in each record populate it with a 1 if they have it or a 0 if they don't, such that I only have 1 row per record and my output looks like:

+-----------+------------+-----------+-----+-------------+-------+-------+------+
| Unique ID | First Name | Last Name | Age | Intelligent | Funny | Short | Tall |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
|         1 | Bob        | Smith     |  25 |           1 |     1 |     1 |    0 |
|         2 | Jim        | Murphy    |  62 |           0 |     1 |     0 |    1 |
|         3 | Kelly      | Green     |  33 |           0 |     0 |     0 |    1 |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
Stelav
  • 25
  • 4

2 Answers2

3

A more consumable data, and a solution using dplyr and tidyr:

library(dplyr)
library(tidyr)
read.table(header=TRUE, stringsAsFactors=FALSE, text="
  Unique_ID   First_Name   Last_Name   Age   Characteristic  
          1   Bob          Smith        25   Intelligent     
          1   Bob          Smith        25   Funny           
          1   Bob          Smith        25   Short           
          2   Jim          Murphy       62   Tall            
          2   Jim          Murphy       62   Funny           
          3   Kelly        Green        33   Tall") %>%
  mutate(v = 1L) %>%
  tidyr::spread(Characteristic, v, fill=0L)
#   Unique_ID First_Name Last_Name Age Funny Intelligent Short Tall
# 1         1        Bob     Smith  25     1           1     1    0
# 2         2        Jim    Murphy  62     1           0     0    1
# 3         3      Kelly     Green  33     0           0     0    1

Most of the work was done with spread. Unfortunately, this has NA instead of 0 for all of the empty spots. If you can live with it, you're good. (Edited based on @www's suggestion.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
2

Here is another tidyverse solution.

df %>%
  mutate(ind = 1L) %>%
  spread(Characteristic, ind, fill = 0L)

#   Unique.ID First.Name Last.Name Age Funny Intelligent Short Tall
# 1         1        Bob     Smith  25     1           1     1    0
# 2         2        Jim    Murphy  62     1           0     0    1
# 3         3      Kelly     Green  33     0           0     0    1

You can also use reshape2 to account for the case when there are more than 1 instance of each case.

library(reshape2)
dcast(df, ...~Characteristic, fun.aggregate = length)

The data

df <- read.table(text = "Unique ID | First Name | Last Name | Age | Characteristic 
         1 | Bob        | Smith     |  25 | Intelligent    
         1 | Bob        | Smith     |  25 | Funny          
         1 | Bob        | Smith     |  25 | Short          
         2 | Jim        | Murphy    |  62 | Tall           
         2 | Jim        | Murphy    |  62 | Funny          
         3 | Kelly      | Green     |  33 | Tall         ", sep = "|", header = T, strip.white = T, stringsAsFactors = F)
hpesoj626
  • 3,529
  • 1
  • 17
  • 25