1

I have a dataframe that looks like the following:

Date Value Company
1/2/13 10 Company1
1/2/14 20 Company2
1/2/15 30 Company1
1/2/16 40 Company3
1/2/17 50 Company2
1/2/18 60 Company3

I would like to subset this dataframe to create 3 different dataframes (one for each unique company). I have been using

assets <- unique(df$Company)
length(assets)
asset1 <- df %>% filter(Company == assets[1])
asset2 <- df %>% filter(Company == assets[2])
asset3 <- df %>% filter(Company == assets[3])

but this gets time consuming if there are 50+ assets.

I was wondering if there was a function in dplyr or base packages that can create the dataframes in a more efficient manner.

Thank you for your help.

EDIT:

I have tried to create a time series plot using

by_asset <- df %>% group_by(Company)


plots = ggplot(data = by_asset) + aes(x = Date, y = Value) +
    geom_point()

but plot returns blank.

nfalesit
  • 115
  • 1
  • 8
  • 2
    `assets <- split(df, df$Company)` will produce a [list of frames](https://stackoverflow.com/a/24376207/3358227), the recommended way of storing and processing similarly-structured frames. But I wonder if what you ultimately need will be sufficiently handled using `dplyr::group_by` instead of breaking the frame into smaller frames. – r2evans Feb 16 '22 at 22:42
  • Ultimately I will be making a time series graph for each company and overlaying them on one plot. Would you recommend using `split()` or `group_by()` for this? – nfalesit Feb 16 '22 at 22:46
  • 1
    If your only need for splitting them is to plot different lines on a single plot, then certainly `group_by` is the way to go. Base graphics can still be used without problem, but I think `ggplot2` makes it much easier in the end for plotting that kind of data. (It makes working with multiple frames a little cumbersome ... *much* smoother to keep it all together and use `aes`thetics (in ggplot-speak).) – r2evans Feb 16 '22 at 23:16
  • Just edited my post to reflect some changes where I tried using `group_by` and `ggplot2` but my syntax must be way off as it still did not work. – nfalesit Feb 16 '22 at 23:32
  • 2
    @nfalesit If you're looking for a plot that has a different colored line for each company, you might try something simple such as `ggplot(data = df, aes(x = Date, y = Value, color = Company)) + geom_line()`, you wouldn't need to create different data.frames, or `group_by` data...if I'm misunderstanding, please ignore the suggestion... – Ben Feb 17 '22 at 01:40

1 Answers1

1

You can nest your dataframe. This creates a new column with nested lists for every group. You can then use lapply or map to write functions

df <- df %>% 
  group_by(Company) %>% 
  nest()

lapply(df$data, \(x) mutate(x, mean = mean(Value)))
map(df$data, ~ mutate(., mean = mean (Value)))
Calleros
  • 83
  • 5