1

I am trying to use the cast() function from the Reshape package to convert a long format dataframe into wide format. Below is the dataframe (named textMessagesLong) I am working with.

enter image description here

I am using the cast() function as it's shown in my textbook (Discovering Statistics Using R) as follows: cast(molten data, variables coded within a single column ~ variables coded across many columns, value = "outcome variable"). which in the case above is cast(textMessagesLong, Group ~ variable, value = "value").

I am getting the "Aggregation requires fun.aggregate: length used as default" error.

I tried using the cast() function as indicated here: https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format, but it didn't resolve the error.

I also found the following thread https://stackoverflow.com/questions/9621401/aggregation-requires-fun-aggregate-length-used-as-default but didn't see they answer there (they are using cast() and dcast())

I looked up ?cast() which shows different arguments than what's shown in the book.

To understand the issue a little better, I recreated the dataframe in wide format in MS Excel (saved as a .csv file) where I added a column named "Persons" which is an array 1:50. Then I read the file asssigned to a new dataframe (newDataFrame). Only then I was able to switch between long and wide formats using the following two commands:

newDataFrame <- read.table("New Data Frame.csv", header = TRUE)
long <- melt(newDataFrame, id = c("Persons", "Group"), measured = c("Baseline", "Six_Months")
wide <- cast(long, Person + Group ~ variable, value = "value")

I am not sure why adding "Persons" to the dataframe made the difference. Apologies if my question is naive.

  • By the way, there are 100 rows in the textMessagesLong dataframe. My second image failed uploading. So, each time point + treatment arm has 25 participants. – WizardOfOzi May 28 '23 at 23:11
  • 50 people, 2 measurements each. – WizardOfOzi May 29 '23 at 02:10
  • 1
    Oh, yeah I figured that out. It is because the data is not stored with the subject ID's so the computer doesn't know who is who in your first photo. – Vons May 29 '23 at 02:12
  • 1
    Thanks, Vons! That makes sense. I suppose, in a wide format, each row must define a unique entity (i.e., participant). – WizardOfOzi May 29 '23 at 02:18
  • 1
    Indeed, I think you got it! When you don't have the ID column, there are multiple entries per combination and so some aggregation function is used (in the case, default was `length` which calculates the total number of observations, in this case 25); could also have used `mean` to calculate the average. But when you have got ID AND Group together as identifiers, there is one observation per level of variable (i.e. Baseline and Six Months) and so, this is called casting without aggregation. – Vons May 29 '23 at 02:26

0 Answers0