1

I am a total newbie with R, so I will visit this site a lot for the coming months. To get familiar with R I am trying to get some (old) datasets in R. I already encounter my first problem. I have a dataset (microarray data) with the following columns, probe_name, gene_name, systemic_name, log_fc, ave_expr and p_value. I have multiple read-outs for gene_name so I want to summarise this. However, I do need to keep my systemic_name. So far I created this

df_expression <- microarray_data |>
    group_by(gene_name) |>
  summarise(
    mean_log_fc = mean(log_fc, na.rm = TRUE),
    mean_ave_exp = mean(ave_expr, na.rm = TRUE),
    mean_p_val = mean(p_value, na.rm = TRUE),
    mean_adj_p_val = mean(adj_p_val, na.rm = TRUE)
  )

This results in mean values for the columns, but I lost the systemic_name column. I tried several things, without succes. Should be simple, but I cannot figure this out

I tried to add the select function, but then the code stops at summarise()..

  • 1
    Welcome to StackOverflow. Since we can't access your data, your issue isn't reproducible for others. Please check this post on how to write a reprex: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example You can also provide your data by using `dput(microarray_data)` or `dput(head(microarray_data))` if your dataset is very large. – jrcalabrese Oct 25 '22 at 15:30
  • When doing a group_by + summarise only the grouping columns and the columns created inside summarise will be present in the aggregated dataframe. I don't know that much about genes but you could add your systemic_name column to the group_by. – stefan Oct 25 '22 at 17:07

0 Answers0