0

I have an excel spread sheet with companies names and data regarding sales. Some companies are related and I want to group them together under the same row. I want each column in this row to have the sum of all related companies.

I'm using R, and would appreciate if someone can give me a direction on how to start.

Edit- Example of the data:

Company name Jan sales Jan expenses Jan revenue
Company A 1000 500 500
Company B 2000 500 1500
Company C 3000 400 2600
Company D 4000 100 3900

I want to group all this companies under a single row by the name Company A.

I want it to look like this:

Company name Jan sales Jan expenses Jan revenue
Company A 10000 1500 8500
jennyro
  • 77
  • 7
  • 2
    It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). Also have you tried https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group ? – Ronak Shah Mar 25 '21 at 10:06
  • 1
    Does this answer your question? [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – Marcelo Avila Mar 25 '21 at 14:06

2 Answers2

0

So first of all you have to read your csv-file by using something like:

mycompany <- read.csv("C:/your path here/my_file.csv", sep=";")

To group some companies together you need an id variable that tells you which company is related to which other company. So to help you further it would be helpful if you could provide us an overview of your csv structure.

So give us the output of:

header(mycompany)

EDIT:

to get the above output you could do:

new.table <- c()
new.table <- data.frame(new.table)

for(i in 2:ncol(mycompany)){
  new.table[1,1] <- as.character(mycompany[1,1])
  new.table[1,i] <- sum(mycompany[,i])
}

new.table
  • Hi, please see above the edit, I added an example of the data frame. thanks again! – jennyro Mar 25 '21 at 11:10
  • In general, how do you know which companies belong together? Is there a shared id among those or do you just "know" it by their names? So why do you know that companies B, C, and D belong to company A? – evian_bottle Mar 25 '21 at 12:15
  • No shared ID, it's something I just now. – jennyro Mar 31 '21 at 06:21
  • Can you explain a bit more about the code you wrote? What is supposed to be written inside c()? What do I need to change? – jennyro Mar 31 '21 at 06:22
  • So I just edited my post above. First import your csv to R (my first code field). Than go directly to my last code field and run it (no changes needed) – evian_bottle Mar 31 '21 at 08:38
  • The c() basically just creates an empty vector which then is transformed to an empty data frame. This empty data frame is then filled inside the for-loop with the new data – evian_bottle Mar 31 '21 at 08:43
0

So if your related companies are stored in a vector say related, you can proceed as follows.

library(dplyr)

data <- read.table(text = "'Company name'   'Jan sales' 'Jan expenses'  'Jan revenue'
'Company A' 1000    500 500
'Company B' 2000    500 1500
'Company C' 3000    400 2600
'Company D' 4000    100 3900", header = T)

> data
  Company.name Jan.sales Jan.expenses Jan.revenue
1    Company A      1000          500         500
2    Company B      2000          500        1500
3    Company C      3000          400        2600
4    Company D      4000          100        3900

related <- c("Company A", "Company B", "Company C", "Company D")

data %>% group_by(Company = ifelse(Company.name %in% related, "related companies", Company.name)) %>% 
  summarise(across(starts_with("Jan"), ~sum(.)))

# A tibble: 1 x 4
  Company           Jan.sales Jan.expenses Jan.revenue
* <chr>                 <int>        <int>       <int>
1 related companies     10000         1500        8500

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45