1

I have been struggling firstly to merge columns between two particular data frames, and also to merge rows within a dataframe and adding up their values. I want to first add columns "X" and "Y" from Table 1 onto the end of Table 2. In table 2 some of the "Towns" such as "Town A" are repeated. I want to merge this rows whilst adding up the data in the rows.

Table 1

       |        X|         Y |     
  |Town|
  |A   |       21|         23|           
  |A   |       21|         23|               
  |B   |       21|         23|               
  |C   |       21|         23|                
  |D   |       21|         23|                
  |D   |       21|         23|                
  |E   |       21|         23|                 
  |E   |       21|         23|               
  |F   |       21|         23|                
  |F   |       21|         23|                    

Table 2

       |Species A| Species B | Species C| Species D| Species E | Species F |    
  |Town|
  |A   |       21|         23|        15|        0 |         3 |          7| 
  |A   |       21|         23|        15|        0 |         3 |          7| 
  |B   |       21|         23|        15|        0 |         3 |          7| 
  |C   |       21|         23|        15|        0 |         3 |          7| 
  |D   |       21|         23|        15|        0 |         3 |          7| 
  |D   |       21|         23|        15|        0 |         3 |          7| 
  |E   |       21|         23|        15|        0 |         3 |          7| 
  |E   |       21|         23|        15|        0 |         3 |          7| 
  |F   |       21|         23|        15|        0 |         3 |          7| 
  |F   |       21|         23|        15|        0 |         3 |          7|     

Some of the code I have attempted to use are the c.bind and merge function, and also I have tried to use run.seq as shown here:

run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))
L <- list(df1, df2) 
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq("Town"))) 
out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]

Which didn't quite work.

What code is best suited for this type of merge/combination?

I will attach the structure of my tables below if that helps:

Table 1

structure(list(Town = c("A", "A", "B", "C", "D", "D", "E", "E", "F", "F"), Captured = c(168L, 16L, 243L, 12L, 17L, 15L, 7L, 233L, 14L, 12L), Proportion = c(0.23, 0.02, 0.33, 0.02, 0.02, 0.02, 0.01, 0.32, 0.02, 0.02)), class = "data.frame", .Names = c("Town", "Captured", "Proportion"), row.names = c(NA,-10L))

Table 2

structure(c(106L, 7L, 5L, 4L, 4L, 4L, 4L, 18L, 5L, 3L, 38L, 6L, 234L, 6L, 8L, 5L, 3L, 203L, 4L, 7L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 0L, 0L, 1L, 0L, 2L, 0L, 0L, 20L, 2L, 3L, 2L, 5L, 5L, 0L, 7L, 5L, 2L), .Dim = c(10L, 6L), .Dimnames = structure(list(Town = c("A", "A", "B", "C", "D", "D", "E", "E", "F", "F"), Species = c("funestus", "gambiae", "indeterminada", "outro", "pharoenois", "tenebrosus")), .Names = c("Town", "Species")), class = "table")
Jaap
  • 81,064
  • 34
  • 182
  • 193
Aetos
  • 50
  • 8
  • When I use cbind, say something like `<- cbind(ID, vp["Village"])` The number of rows increases to 60 rows and the all the data within the table becomes muddled. Which has happened with a few variations of cbind and aggregate I have tried – Aetos Mar 26 '16 at 09:07
  • Or something like `merge(x=df1, y=df2, by = "Town", all.x = TRUE)` a similar thing occurs. – Aetos Mar 26 '16 at 09:10
  • The same thing happens, the species, instead of being column headings are listed in a new column. The number of rows increases to 60 and the information in the table is mixed up. – Aetos Mar 26 '16 at 09:21
  • 2
    Make your example reproducible so we can play with it...post `dput` instead of `str` – Sotos Mar 26 '16 at 09:22
  • 1
    Some info on how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Mar 26 '16 at 09:27
  • couple of problems: Table1 is a `data.frame`, Table 2 is a `table`. You cannot merge by `"Town"`, as in the second Table the column name is `"Towm"`. Lastly, If Town appears multiple times in Table 1 and Table 2, u will get more rows, as it' s not clear, if first Observation with Town A in Table 1 belongs to first or second observation with Town A in Table 2. Analogously with second observation with Town A in Table 1. – adibender Mar 26 '16 at 09:30
  • OK thanks for explaining I was getting multiple rows. So I need to go back a bit in my code and produce Table2 as a dataframe to start with, to make things compatible? – Aetos Mar 26 '16 at 09:41
  • Are you sure I can see it on my end? I will try posting it on the end again. – Aetos Mar 26 '16 at 09:43
  • so to be clear, you want to bind all columns except `Town`? – Sotos Mar 26 '16 at 12:17
  • Yes and then merge the rows by Town "name". – Aetos Mar 26 '16 at 12:19
  • Check answer...is that what you need? – Sotos Mar 26 '16 at 12:26

1 Answers1

3

It is probaly best to summarise first and then merge / join the two datasets. With the table-format of table 2 you can also use the melt and dcast functions of reshape2 with sum as aggregate function (which results in a dataframe) and then merge with the aggregated t1 dataframe:

library(reshape2)
# aggragate 't1'
t1sum <- aggregate(.~Town, t1, sum)
# reshape and aggregate 't2'
t2sum <- dcast(melt(t2), Town ~ Species, fun.aggregate = sum)
# or with 'as.data.frame(t2)' instead of 'melt(t2)'
t2sum <- dcast(as.data.frame(t2), Town ~ Species, fun.aggregate = sum)

merge(t1sum, t2sum, by = 'Town')

gives:

  Town Captured Proportion funestus gambiae indeterminada outro pharoenois tenebrosus
1    A      184       0.25      113      44             1     2          2         22
2    B      243       0.33        5     234             0     0          1          3
3    C       12       0.02        4       6             0     0          0          2
4    D       32       0.04        8      13             0     0          1         10
5    E      240       0.33       22     206             0     0          2          7
6    F       26       0.04        8      11             0     0          0          7

Using the data.table package you could do a similar operation:

library(data.table)
t1dt <- setDT(t1)[, lapply(.SD, sum), by = Town]
t2dt <- dcast(setDT(melt(t2)), Town ~ Species, fun.aggregate = sum)

t1dt[t2dt, on='Town']

Used data:

t1 <- structure(list(Town = c("A", "A", "B", "C", "D", "D", "E", "E", "F", "F"), Captured = c(168L, 16L, 243L, 12L, 17L, 15L, 7L, 233L, 14L, 12L), Proportion = c(0.23, 0.02, 0.33, 0.02, 0.02, 0.02, 0.01, 0.32, 0.02, 0.02)), class = "data.frame", .Names = c("Town", "Captured", "Proportion"), row.names = c(NA,-10L))
t2 <- structure(c(106L, 7L, 5L, 4L, 4L, 4L, 4L, 18L, 5L, 3L, 38L, 6L, 234L, 6L, 8L, 5L, 3L, 203L, 4L, 7L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 0L, 0L, 1L, 0L, 2L, 0L, 0L, 20L, 2L, 3L, 2L, 5L, 5L, 0L, 7L, 5L, 2L), .Dim = c(10L, 6L), .Dimnames = structure(list(Town = c("A", "A", "B", "C", "D", "D", "E", "E", "F", "F"), Species = c("funestus", "gambiae", "indeterminada", "outro", "pharoenois", "tenebrosus")), .Names = c("Town", "Species")), class = "table")
Jaap
  • 81,064
  • 34
  • 182
  • 193