This solution is using packages dplyr
, purrr
, tidyr
and ggplot2
(for plotting purposes).
# example dataset
df = read.table(text = "
name attr1 attr2
supplier1 10 87
supplier1 11 88
supplier1 12 89
supplier1 13 21
supplier2 20 31
supplier2 21 75
supplier2 22 75
supplier3 30 47
supplier3 19 22
", header=T, stringsAsFactors=F)
library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)
df %>%
group_by(name) %>% # for each supplier
nest() %>% # nest data
mutate(MEANS = map(data, ~ .x %>% summarise_all(mean)), # obtain mean of rest of columns
PLOTS = map2(data, name, # plot data and use the supplier as a title
~ggplot(data = .x) +
geom_point(aes(attr1, attr2)) +
ggtitle(.y))) -> df_upd # save this a new data frame
# # check how your new dataset looks like
df_upd
# # A tibble: 3 x 4
# name data MEANS PLOTS
# <chr> <list> <list> <list>
# 1 supplier1 <tibble [4 x 2]> <tibble [1 x 2]> <S3: gg>
# 2 supplier2 <tibble [3 x 2]> <tibble [1 x 2]> <S3: gg>
# 3 supplier3 <tibble [2 x 2]> <tibble [1 x 2]> <S3: gg>
For each supplier value you have column data
(a list of data frames with actual data), column MEANS
(a list of data frames with the calculated means) and column PLOTS
(a list of plots of your data).
Therefore, so far you've managed to create a (new) data frame with previous/original info (column data
) plus calculated info (columns MEANS
and PLOTS
). Next step is to save the new info in separate files, as you've mentioned:
# save each MEANS dataset in a separate file using the corresponding name
map2(df_upd$MEANS, df_upd$name, ~ write.csv(.x, .y, row.names = F))
# save each plot separately using the corresponding name
map2(df_upd$PLOTS, df_upd$name, ~ .x + ggsave(paste0(.y, ".png")))
Note that you can access the info of this data frame like any other data frame. For example: df_upd$MEANS
will give you the list of data frames of all calculated means, df_upd$MEANS[df_upd$name == "supplier2"]
will give you the previous info for supplier2, df_upd$data[df_upd$name == "supplier3"]
will give you the (original) attributes for supplier3, etc.