Divide a big dataset in subsets of 100, calculate mean, and plot

Question

I have a huge data set that containts 30000 columns full with data. I want to take one row and plot the means of sets of 100 (the first 100 entries, the second 100 entries, and so on), so a total of 300 entries. I have the script for the plot ready, but I can't figure out how to divide my data into sets of 100.

Can anybody of you help? Thank you

The function I want to apply is CV <- function(x, ...){(sd(x, ...)/mean(x, ...))*100} and I've tried something like byapply(DataSet$column., rep(1:30000, each = 100), rowMeans) but this totally did not work — Anya Drake, Nov 28 '17 at 10:39

score 0 · Answer 1 · answered Nov 27 '17 at 12:00

It may be easier to melt the data, add a column identifier (1:300, repeated 100 times each), and then summarize by that column.

So something like:

library(dplyr)
df <- df %>%
   gather(Key, Value) %>%
   mutate(ID = rep(1:300, each = 100)) %>%
   group_by(Key, ID) %>%
   summarize(Mean = mean(Value))

ggplot(df) + 
   geom_point(aes(x = ID, y = Mean))

You'll have to customize the code, since I don't have the data structure...

Divide a big dataset in subsets of 100, calculate mean, and plot

1 Answers1