0

I have a long format data set that I want to change into wide. I have tried to use the "reshape" function from the "reshape" library to do it.

My dataset "df_long" has the following three columns:

  1. a numeric "ID" variable
  2. a character variable "positive" ("true positive", "false positive")
  3. a numeric "coder" variable (1,2)

I have made sure that for each value of "coder" I have just one value of "ID."

my input data structure

I want to transform it into wide format so that I have one row for each ID with the columns positive_coder1 and positive_coder2. This is the reshape code I used.

df_wide <- reshape(df_long, v.names = c("positive"), timevar = "coder", 
                idvar = "ID", direction = "wide", sep = "_")

What I get is this:

my output data structure

There are no NAs in my input data. What am I missing?

Progman
  • 16,827
  • 6
  • 33
  • 48
Chris_S
  • 11
  • 4

1 Answers1

0

The main issue you had is related with the type of your data structure. This is what you need before the reshaping. In this little example you will get one NA as data is only showing a portion but in your real dataset, there would not be issues:

#Transform
df_long <- as.data.frame(df_long)
#Code
df_wide <- reshape(df_long, v.names = c("positive"), timevar = "coder", 
                   idvar = "ID", direction = "wide", sep = "_")

Output:

df_wide
         ID    positive_2     positive_1
1 147256167 True positive False positive
3 147256191 True positive False positive
5 147256290 True positive  True positive
7 147256379 True positive           <NA>

Let's check your error in this way. We will assume that df_long has next classes:

class(df_long)
[1] "tbl_df"     "tbl"        "data.frame"

If we apply the reshaping we will get:

df_wide
# A tibble: 4 x 2
  ID        `positive_c(2, 1)`
  <chr>     <chr>             
1 147256167 NA                
2 147256191 NA                
3 147256290 NA                
4 147256379 NA 

Your same output, so the issue is related with tibble class. And using df_long <- as.data.frame(df_long) will alleviate the issue.

Some data used:

#Data
df_long <- structure(list(ID = c("147256167", "147256167", "147256191", 
"147256191", "147256290", "147256290", "147256379"), positive = c("True positive", 
"False positive", "True positive", "False positive", "True positive", 
"True positive", "True positive"), coder = c(2, 1, 2, 1, 2, 1, 
2)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"
))

Having tibbles, you could use some functions from tidyr like these:

library(dplyr)
library(tidyr)
#Code
df_wide <- df_long %>% pivot_wider(names_from = coder,values_from=positive,
                        names_prefix = 'positive_')

With output:

# A tibble: 4 x 3
  ID        positive_2    positive_1    
  <chr>     <chr>         <chr>         
1 147256167 True positive False positive
2 147256191 True positive False positive
3 147256290 True positive True positive 
4 147256379 True positive NA            

Last one can be an option for you as you are mainly using reshape().

Duck
  • 39,058
  • 13
  • 42
  • 84