0

I am very new to coding and just started doing some R graphics and now I am kinda lost with my data analyse and need some light! I am training some analyses and I got a very long dataset with 19 Countries x 12 months x 22 Products and for every month a Profit. Kinda like this:

Country   Month   Product Profit
Brazil    Jan     A      50
Brazil    fev     A      80
Brazil    mar     A      15
Austria   Jan     A      35
Austria   fev     A      80
Austria   mar     A      47
France    Jan     A      21
France    fev     A      66
France    mar     A      15
[...]
France    Dez     C      40 etc...

I am was thinking to do one graph showing the profits through the year and another for every country, so I could see the top and bottom 2 countries. I wanted to have something like:

All Countries   Jan   106        or     Brazil   2021   145
All Countries   Fev   146               Austria  2021   162
All Countries   Mar   77                France   2021   112

but the sum function can't help with characters type and as I have a long List, idk how to sum only part of the column.

sorry if it got confusing.

2 Answers2

0

Using base R, you can try something along these lines.

# sum of profit per month
out1 <- tapply(df$Profit, df$Month, sum)

# sum of profit per year per country
out2 <- data.frame(
  profit = sapply(split(df, f = ~ df$Country + df$Year), function(x) sum(x$Profit))
)
out2$Country <- gsub('\\.[0-9]*', '', rownames(out2))
out2$Year <- gsub('[a-zA-z]*\\.', '', rownames(out2))
rownames(out2) <- NULL

Output

> out1
Apr Aug Feb Jan Jul Jun Mar May Sep 
105 198 240 150  63 141  45 240  45 

> head(out2)
  profit Country Year
1    162 Austria 2019
2    145  Brazil 2019
3    102  France 2019
4    162 Austria 2020
5    145  Brazil 2020
6    102  France 2020

Data

# sample data
df <- data.frame(
  Country = rep(c(rep('Brazil',3L),rep('Austria',3L),rep('France',3L)), 3L),
  Profit = rep(c(50,80,15,35,80,47,21,66,15), 3L),
  Month = rep(c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep'),3L),
  Year = sort(rep(c(2019,2020,2021), 9L))
)
Dion Groothof
  • 1,406
  • 5
  • 15
0

The package dplyr has quite a natural syntax for this:

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(
  Country = rep(c(rep("Brazil", 3L), rep("Austria", 3L), rep("France", 3L)), 3L),
  Profit = rep(c(50, 80, 15, 35, 80, 47, 21, 66, 15), 3L),
  Month = rep(c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"), 3L),
  Year = sort(rep(c(2019, 2020, 2021), 9L))
)
df %>%
  group_by(Country, Month) %>%
  summarize(sum = sum(Profit))
#> `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
#> # A tibble: 9 × 3
#> # Groups:   Country [3]
#>   Country Month   sum
#>   <chr>   <chr> <dbl>
#> 1 Austria Apr     105
#> 2 Austria Jun     141
#> 3 Austria May     240
#> 4 Brazil  Feb     240
#> 5 Brazil  Jan     150
#> 6 Brazil  Mar      45
#> 7 France  Aug     198
#> 8 France  Jul      63
#> 9 France  Sep      45
Bart
  • 1,267
  • 7
  • 18