1

I am new to reshaping data frames. I have a df that I would like to make wider so I can use it in analysis such as cluster and NMDS. I have found several questions and (answers) related to how to reshape data containing primarily quantitative data (using aggregation functions), but in my case, my variables are all categorical.

As my df has a thousand rows and dozens of columns, I've created a toy df to use as an example. It looks like this:

df <- data.frame(
  id=c("a","c", "a","b","d","c","e","d","c","a","a","e","a","b","d"), 
  color=c("red", "blue", "gray", "yellow", "green","green","blue","purple"            ,"black","green","yellow","blue","red","yellow","gray"),
  fruit=c("apple", "orange", "avocado", "strawberry", "banana", "apple",               "orange", "avocado", "strawberry", "banana","banana", "strawberry",           "watermelon", "lemon", "lemon" ),
  country = c("Italy", "Spain", "Brazil", "Brazil", "Australia", "Italy",           "Japan", "India", "USA", "Mexico", "USA", "Mexico", "Spain",              "France", "France"),
  animal=c("alligator", "camel", "alligator", "bat", "dolphin", "camel",                "elephant", "dolphin", "camel", "alligator", "alligator",                    "elephant", "alligator", "bat", "dolphin")) 

I would like the column "id" to be the first in my reshaped data frame, "animal" to be the second, and then the levels of "color", "fruit", and "country". The point here is that I wanted them separated.

The code below shows some attempts I've made:

df <- dplyr::mutate_if(df,is.character,as.factor) 
attach(df)

dcast(df, id ~ color,value.var = "id") #The output is exactly what I wanted! 

dcast(df, id + animal ~ color,value.var = "id") #Exactly what I wanted!

dcast(df, id + animal ~ fruit,value.var = "id") #Exactly what I wanted!

dcast(df, id ~ country, value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?

dcast(df, id ~ color + country, value.var = "id") #Not the output what I wanted.

dcast(df, id + animal~ color + country, value.var = "id") #Not the output I wanted.

dcast(df, id + animal~ color + country + fruit, value.var = "id") #Not the output I wanted.

My expected reshaped df should look like this:

Expected reshape data frame

To achieve that, I've tried all commands below, but none of those worked well:

dcast(df, id + animal ~ color + country + fruit, fun.aggregate=length)

dcast(df, id + animal ~ c(color, country, fruit), fun.aggregate=length)

dcast(df, id + animal ~ c("color", "country", "fruit"), fun.aggregate=length)

dcast(df, id + animal ~ color:fruit, fun.aggregate=length)

I have also tried to do that using tidyr::pivot_wider, without success.

Is there a way to achieve my goal using reshape2::dcast or tidyr::pivot_wider, or any other function in R? I would appreciate it if you guys could help me. Thanks in advance.

mad10les
  • 152
  • 2
  • 9
  • Welcome to SO, doesn't [this post](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) answer your question? – Jan Apr 23 '21 at 17:31
  • Thank you @Jan, I'll check it out! – mad10les Apr 23 '21 at 19:59

1 Answers1

1

First, you have to pivot_longer to get the column names you desire into a column. Then I arranged it by the future column names, so the words would be grouped, like your image, then I used pivot_wider. It drops the animal column, so I put it back, then arranged by id, so they would be in the same observation order as your image.

pivot_longer(df, cols = color:country, names_to = "variable", 
             values_to = "value") %>%                       # column names to rows
  arrange(variable, value) %>%                              # organize future column names
  pivot_wider(!variable, names_from = value, values_from = animal, 
              values_fn = list(animal = length), values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%                      # add animals back
  select(id, animal, everything()) %>%                      # rearrange columns
  arrange(id)                                               # reorder observations

enter image description here

Update based on your comment - ordered by color, fruit, then country

Added mutate and modified the first arrange and pivot_wider:

pivot_longer(df,cols = color:country, names_to = "variable", 
             values_to = "value") %>%                # future col names to rows
  mutate(ordering = ifelse(variable == "color", 1,   # create organizer variable
                           ifelse(variable == "fruit", 2, 3))) %>% 
  arrange(ordering, value) %>%                       # organize future column order
  pivot_wider(!c(variable,ordering),                 # make it wide
              names_from = value, 
              values_from = animal, 
              values_fn = list(animal = length), 
              values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%               # add the animals back
  select(id, animal, everything()) %>%               # move animals to 2nd position
  arrange(id)                                        # reorder observations 

Check it out: enter image description here

Kat
  • 15,669
  • 3
  • 18
  • 51