-2

I have a list like this one (in reality is very big)

name         attr1         attr2
supplier1    10            87
supplier1    11            88
supplier1    12            89
supplier1    13            21
supplier2    20            31
supplier2    21            75
supplier2    22            75
supplier3    30            47
supplier3    19            22

I need to work with each supplier separatly.

Let's say I need to compute mean and plot a graph for each supplier. Furthermore, let's say I need to save each mean (txt file) and each graph (pdf/jpeg file) with the name of the supplier (which I should get from the list).

I am very new to R, if you can, an explanation would be really appreciated!

2 Answers2

1

I believe there must be some duplicates for this question on SO. However, as this question is asking to create separate output files for each aggregation level I'm not sure if a dupe is easily found.

You can try to work your way along the following suggestions:

library(data.table)
setDT(DF)[, lapply(.SD, mean), by = name]
        name attr1    attr2
1: supplier1  11.5 71.25000
2: supplier2  21.0 60.33333
3: supplier3  24.5 34.50000

If you need a separate txt file for each supplier:

setDT(DF)[, fwrite(c(name = name, lapply(.SD, mean)), paste0(name, ".txt")), by = name]

To create a file for each name containing an individual graph:

library(ggplot2)
DF[, {ggplot(.SD) + aes(attr1, attr2) + geom_point() + ggtitle(name);
  ggsave(paste0(name, ".png"))}, by = name]

E.g., file supplier1.png will contain:

enter image description here

Uwe
  • 41,420
  • 11
  • 90
  • 134
0

This solution is using packages dplyr, purrr, tidyr and ggplot2 (for plotting purposes).

# example dataset
df = read.table(text = "
                name         attr1         attr2
                supplier1    10            87
                supplier1    11            88
                supplier1    12            89
                supplier1    13            21
                supplier2    20            31
                supplier2    21            75
                supplier2    22            75
                supplier3    30            47
                supplier3    19            22
                ", header=T, stringsAsFactors=F)

library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)


df %>%
  group_by(name) %>%                                        # for each supplier
  nest() %>%                                                # nest data
  mutate(MEANS = map(data, ~ .x %>% summarise_all(mean)),   # obtain mean of rest of columns
         PLOTS = map2(data, name,                           # plot data and use the supplier as a title
                     ~ggplot(data = .x) +
                       geom_point(aes(attr1, attr2)) +
                       ggtitle(.y))) -> df_upd              # save this a new data frame

# # check how your new dataset looks like
df_upd

# # A tibble: 3 x 4
#          name             data            MEANS    PLOTS
#         <chr>           <list>           <list>   <list>
#   1 supplier1 <tibble [4 x 2]> <tibble [1 x 2]> <S3: gg>
#   2 supplier2 <tibble [3 x 2]> <tibble [1 x 2]> <S3: gg>
#   3 supplier3 <tibble [2 x 2]> <tibble [1 x 2]> <S3: gg>

For each supplier value you have column data (a list of data frames with actual data), column MEANS (a list of data frames with the calculated means) and column PLOTS (a list of plots of your data).

Therefore, so far you've managed to create a (new) data frame with previous/original info (column data) plus calculated info (columns MEANS and PLOTS). Next step is to save the new info in separate files, as you've mentioned:

# save each MEANS dataset in a separate file using the corresponding name 
map2(df_upd$MEANS, df_upd$name, ~ write.csv(.x, .y, row.names = F))

# save each plot separately using the corresponding name 
map2(df_upd$PLOTS, df_upd$name, ~ .x + ggsave(paste0(.y, ".png")))

Note that you can access the info of this data frame like any other data frame. For example: df_upd$MEANS will give you the list of data frames of all calculated means, df_upd$MEANS[df_upd$name == "supplier2"] will give you the previous info for supplier2, df_upd$data[df_upd$name == "supplier3"] will give you the (original) attributes for supplier3, etc.

AntoniosK
  • 15,991
  • 2
  • 19
  • 32