2

I have several files (20) with the same column structure but with different rows structure. All are composed of two columns, the first are factors and the second are integers. I want to sum the column of integers for the factors that are repeated and the new ones that are simply added. How could I combine and sum the ones already repeated?

I've thought about combining cbind and tapply but I really do not know how to carry this out.

A simple example of the structure of the files:

Shop   Clients     Shop  Clients     Shop Clients
 A        9          D      8          A     5
 B        7          A      4          R     4
 C        4          F      3          C     3
 D        2          B      1          B     2

I expect the output:

Shop Clients
A      18 
B      10
C       7
D      10
F       3
R       4

I read the different files in a loop, creating a dataset for each of them, so that the dataset shows City1$Shop and City1$Clients for example. This case is only for 20 files but I would like to know how to work with more (for example 100). How could I solve this problem by reading the datasets in this way?

f<-function(x){
  read.delim2("p01.txt",header=T,sep="\t",stringsAsFactors = FALSE)
}
for(i in x){
total<-f(i)
#Here I suppose I would combine and sum the datasets
}
Cris_91
  • 39
  • 5
  • So with the edit it now seems that you do not have **one** file with multiple columns named `"Shop"` and `"Clients"`. The example above are in fact several files? If so the posted solutions probably do not work. – Rui Barradas May 20 '19 at 18:19
  • Also, is there a pattern in the filenames? If this is the case then they can be read in with `fnames <- list.files(pattern = )` followed by a `lapply` loop. See [here](https://stackoverflow.com/questions/3397885/how-do-you-read-in-multiple-txt-files-into-r) and [here](https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once). – Rui Barradas May 20 '19 at 18:20
  • I thought at first that I had explained it in a clear way, but when I saw the answers I realized that I had to edit it and show in detail how I read the data. It's my first question and I'm learning to ask questions in a clear way. Sorry! I had thought about creating a "total" dataframe in which to add and sum each dataframe that I am reading, or is it better all at once? I'm stuck with this problem. – Cris_91 May 20 '19 at 18:30
  • Do **all filenames** start with a `"p"` followed by one or more digits and have a `.txt` extension? – Rui Barradas May 20 '19 at 18:31
  • Yeah, I read them in this way ` x<-list.files(pattern=".txt") ` I have the problem to put them together and sum the columns – Cris_91 May 20 '19 at 18:34
  • I mean, I've managed to read them in a loop, the problem is what to do next with them. – Cris_91 May 20 '19 at 18:35
  • See the edit, I believe this is it. – Rui Barradas May 20 '19 at 18:39

3 Answers3

1

We can melt the data into 'long' format by specifying the measure columns with patterns of column names as 'Shop' 'Clients', then grouped by 'Shop' get the sum of 'Clients'

library(data.table)
melt(setDT(df1), measure = patterns("^Shop", "^Clients"), 
  value.name = c("Shop", "Clients"))[, .(Clients = sum(Clients)), by = Shop]
#    Shop Clients
#1:    A      18
#2:    B      10
#3:    C       7
#4:    D      10
#5:    F       3
#6:    R       4

Or using tidyverse

library(tidyverse)
map_dfc(list(Shop = "Shop", Clients = "Clients"), ~
    df1 %>%
       select(matches(.x)) %>% 
       unlist) %>% 
  group_by(Shop) %>% 
  summarise(Clients = sum(Clients))
# A tibble: 6 x 2
#  Shop  Clients
#  <chr>   <int>
#1 A          18
#2 B          10
#3 C           7
#4 D          10
#5 F           3
#6 R           4

Or with rowsum from base R

i1 <- grepl("^Shop", names(df1))
rowsum(unlist(df1[!i1]), group =  unlist(df1[i1]))

data

df1 <- structure(list(Shop = c("A", "B", "C", "D"), Clients = c(9L, 
 7L, 4L, 2L), Shop.1 = c("D", "A", "F", "B"), Clients.1 = c(8L, 
 4L, 3L, 1L), Shop.2 = c("A", "R", "C", "B"), Clients.2 = 5:2), 
  class = "data.frame", row.names = c(NA, -4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

A possibility is to use aggregate after reshaping the data to a long format.

inx <- grep("Shop", names(df1))
long <- do.call(rbind, lapply(inx, function(i) df1[i:(i + 1)]))
aggregate(Clients ~ Shop, long, sum)
#  Shop Clients
#1    A      18
#2    B      10
#3    C       7
#4    D      10
#5    F       3
#6    R       4

Edit.
After the edit to the question, I believe the following does what is asked. I will once again use aggregate.

fnames <- list.files(pattern = "\\.txt")
df_list <- lapply(fnames, read.table, header = TRUE)
df_all <- do.call(rbind, df_list)
aggregate(Clients ~ Shop, data = df_all, sum)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • It works! I was complicating the problem with a loop, it is much simpler than I thought. Thank you very much for the help and patience! – Cris_91 May 20 '19 at 18:48
  • What if I have multiple headers not just two as in( Clients ~ Shop)?? – Amal Nasir May 25 '21 at 20:54
  • @AmalNasir Can you give an example? It is possible to aggregate by several columns, say, `Clients ~ Shop + Month`. – Rui Barradas May 25 '21 at 21:38
  • @RuiBarradas Yes, I posted my question here, https://stackoverflow.com/questions/67692461/sum-up-tables-results-from-multiple-sheets-into-one-table-in-r – Amal Nasir May 25 '21 at 22:57
1

One tidyverse possibility could be:

df %>%
 select_at(vars(contains("Shop"))) %>%
 gather(var1, val1) %>%
 bind_cols(df %>%
 select_at(vars(contains("Client"))) %>%
 gather(var2, val2)) %>%
 group_by(Shop = val1) %>%
 summarise(Clients = sum(val2))

  Shop  Clients
  <chr>   <int>
1 A          18
2 B          10
3 C           7
4 D          10
5 F           3
6 R           4

The same with base R:

long_df <- data.frame(Shop = stack(df[, grepl("Shop", names(df))])[, 1], 
Clients = stack(df[, grepl("Client", names(df))])[, 1])
aggregate(Clients ~ Shop, long_df, sum)

  Shop Clients
1    A      18
2    B      10
3    C       7
4    D      10
5    F       3
6    R       4
tmfmnk
  • 38,881
  • 4
  • 47
  • 67