0

I have a table that looks like this:

table of neighbouring countries

Further down in the table, the countries in Target.Country are repeated in Source.Country, therefore repeating the combinations but with different numbers, sums and means. Is it possible for when the combinations are the same, sum the remaining columns together and add an additional column to find the average?

For example:

Source.Country Target.Country number   sum_intensity   mean_intensity
North Korea     South Korea    26492     10674.9         0.402
South Korea     North Korea    34912     53848.3         1.542

To be:

Source.Country Target.Country  number  sum_intensity mean_intensity  Average 
North Korea     South Korea     61404   64523.2         1.944         1.05

Any help would be great!

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • Remember to tag with the language! This looks like [tag:r] to me. – Axeman Jun 15 '18 at 11:38
  • Thank you! It is R. – Michelle Ward Jun 15 '18 at 11:51
  • 1
    Please add a real sample of your data, which people can import in R. A screenshot is not very helpful. – Wimpel Jun 15 '18 at 11:54
  • `library(dplyr); df %>% mutate(grp = purrr::map2_chr(Source.Country, Target.Country, ~paste(sort(c(.x, .y))))) %>% group_by(grp) %>% summarise(number = sum(number), sum_intensity = sum(sum_intensity), mean_intensity = sum(mean_intensity), average = mean(mean_intensity))` ? Can't test without data that can be read in. – Axeman Jun 15 '18 at 11:56
  • possibly with `collapse = ' '` inside the `paste` command – Axeman Jun 15 '18 at 12:01
  • I have added the csv to Google Docs if that is helpful? https://drive.google.com/file/d/1AkcHWRzvc-mz84GRv0aw_roQtRpWEDtg/view?usp=sharing – Michelle Ward Jun 15 '18 at 12:13
  • @Axeman I get this error: Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "function" – Michelle Ward Jun 15 '18 at 12:14
  • Please read the info about how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. You need to replace `df` with the names of your data.frame. – Axeman Jun 15 '18 at 12:15

2 Answers2

0

A similar solution to what @Axeman proposed in the comments:

library(purrr)
library(dplyr)
df=data.frame(Source.Country=c('North Korea', 'South Korea'), 
              Target.Country=c('South Korea', 'North Korea'),
              number=c(26492, 34912),
              sum_intensity=c(10674.9, 53848.3),
              mean_intensity=c(0.402, 1.542))

df %>% mutate(grp = purrr::map2_chr(Source.Country, Target.Country, ~paste(sort(c(as.character(.x), as.character(.y))), collapse=' '))) %>% 
    group_by(grp) %>% 
    summarise(number = sum(number), 
    sum_intensity = sum(sum_intensity), 
    mean_intensity = sum(mean_intensity), 
    average = sum_intensity/number)

# # A tibble: 1 x 5
#   grp                     number sum_intensity mean_intensity average
#   <chr>                    <dbl>         <dbl>          <dbl>   <dbl>
# 1 North Korea South Korea 61404.        64523.           1.94    1.05

A few minor tweaks:

  • it does require collapse in the paste command
  • needs as.character to prevent the country names from being coerced into integers
  • mean_intensity can't be used as an output in the summary, then as an input, but an average of averages doesn't make much sense when number is unbalanced anyway. I just recalculated the average from the sums
heathobrien
  • 1,027
  • 7
  • 11
0

I increased the dataframe to check if the code was working properly

df1<-rbind(c( "North Korea ","South Korea" ,       26492 ,    10674.9 ,        
0.402), c(   "South Korea",  "North Korea"  ,  34912  ,   53848.3   ,      1.542),
c( "Canada ","South Korea" ,       26492 ,    10674.9 ,        0.402),
c(   "South Korea",  "Canada"  ,  34912  ,   53848.3   ,      1.542))
colnames(df1)<-c("Source.Country",  "Target.Country",  "number",  "sum_intensity",  
"mean_intensity")
df1<-data.frame(df1)

df1$number<-as.numeric(as.character(df1$number))
df1$sum_intensity<-as.numeric(as.character(df1$sum_intensity))
df1$mean_intensity<-as.numeric(as.character(df1$mean_intensity))

df1$Countries<-apply(cbind(df1$Source.Country, df1$Target.Country), 1, function(x) 
paste(sort(x), collapse=" "))

#
library(reshape)
m1 <- aggregate(df1$number~df1$Countries,data=df1,FUN=mean)
m2 <- aggregate(df1$sum_intensity~df1$Countries,data=df1,FUN=mean)
m3 <- aggregate(df1$mean_intensity~df1$Countries,data=df1,FUN=mean)

mvtab <- merge(rename(m1,c(y="number")),
               rename(m2,c(y="sum_intensity")))
               
mtab2<-merge(mvtab, rename(m3,c(y="mean_intensity")))
Community
  • 1
  • 1
Marta
  • 27
  • 7