0

I revisted some old work, and what used to work for some reason no longer does.

This is a sample of my data:

data <- data.frame(
  geno_list = c('V', 'V', 'V', 'S', 'S', 'S', 'U', 'U', 'U', 'R', 'R', 'R', 'V', 'V', 'V', 'S', 'S', 'S', 'U', 'U', 'U', 'R', 'R', 'R', 'V', 'V', 'V', 'S', 'S', 'S', 'U', 'U', 'U', 'R', 'R', 'R'),
  cat = c(21624.53811, 15915.53721, 2417.154559, 16905.51314, 25170.28969, 3314.584993, 38078.29516, 30738.68487, 1710.259145, 47164.88036, 31280.77812, 1085.29931, 22351.4868, 23502.32928, 5133.710359, 37603.7506, 23208.48192, 15985.77371, 31346.78278, 41745.6875, 2141.689721, 43924.99181, 6991.750454, 15432.87087, 15346.27536, 27041.06851, 8310.91495, 25465.16134, 30320.12777, 9214.112545, 27005.31982, 29088.64926, 1337.82877, 29286.01962, 33435.10852, 1290.04402),
  Month = c('June', 'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July', 'August', 'August', 'August', 'August', 'August', 'August', 'August', 'August', 'August', 'August', 'August', 'August')
)

My old code:

table_avg <- table %>%
  group_by(geno_list, Month) %>%
  summarize(avg = mean(as.numeric(cat))) %>%
  arrange(Month, geno_list)

I wanted a new column, that would get me the average number for 'cat' for each pair of geno_list and Month, ie the average of V in June, V in July and V in Aug. But instead I get this error: Error in order(...) : argument lengths differ

If I remove arrange(Month, geno_list) I get a df that has one value, which is the average of ALL the 'cat'.

I appreciate all the help. Thank you.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    You have almost certainly loaded the `plyr` package after the `dplyr` package (or some other package you loaded brought `plyr` along with it), so you have a non-`dplyr` version of `summarize` masking `dplyr::summarize`. You can specify `dplyr::summarize`, or be more careful with your package loading. See the marked duplicate for more explanation and details. – Gregor Thomas Jun 13 '23 at 14:31
  • Run `conflicts()` to confirm which package is causing the name conflict. – Gregor Thomas Jun 13 '23 at 14:31
  • 1
    Check the number of values in each column. Your code, as is, has 36 values in geno_list and cat, and 37 in Month. Remove the last value from Month, and ensure you are referring to the correct dataset when grouping and summarising. – johnm Jun 13 '23 at 14:32
  • 1
    There's a problem with the data frame `data` here: the `geno_list` and `cat` have 36 elements, and `Month` has 37 so the data frame called `data` isn't being created. Also in your second piece of code, make sure you start with e.g. `data` - where is the object `table()` coming from? Fixing these, means that your code runs for me. – nrennie Jun 13 '23 at 14:32
  • @johnm good catch, although that may have been an issue from when I uploaded the code, as in my environment, it says 36 obs and 3 variables. – Chloe Katya Jun 13 '23 at 14:38
  • @GregorThomas THIS WAS THE ISSUE, THANK YOU!!!!!!!! I guess reloading dplyr after, doesn't change anything. for future reference, do I need to unload plyr first? – Chloe Katya Jun 13 '23 at 14:39

0 Answers0