0

I have a database that contains text variables, and codes applied with qualitative analysis. Each row is generated whenever a code is applied, so that means that if a sentence has 3 codes applied, the database will have three rows for it. I want to merge this, preserving the data of the rest of variables, and summing the code variables.

I have been searching how to do that and can't find a way.

example<-tibble(segments=c('Brexit is bad','Brexit is bad','We need a sit on the table','We need a sit on the table'),
   actor=c("SNP", "SNP", "Labour", "Labour"),
   year=c(2015, 2015, 2017,2017),
   TL_Brexit=c(1,0,0,0),
   Bre_negative=c(0,1,0,0),
   TL_participation=c(0,0,1,0),
   TD_other=c(0,0,0,1))

You can see that there are two quotations, that have been coded with 2 codes each of them, so I want to merge them and have 2 rows instead of 4, so that the 1 and 0 in the code variables are summed (but the year, segment and actor variables remain the same because they are identical) Should look like this:

desiredoutput<-tibble(segments=c('Brexit is bad','We need a sit on the table'),
   actor=c("SNP", "Labour"),
   year=c(2015, 2017),
   TL_Brexit=c(1,0),
   Bre_negative=c(1,0),
   TL_participation=c(0,1),
   TD_other=c(0,1))

Any help will be more than welcome!

Nuria
  • 65
  • 5

1 Answers1

1

If you group by segments, actor, and year, you can then summarise each group by taking the sum of the other columns.

library(dplyr)

example %>% 
  group_by(segments, actor, year) %>% 
  summarise_all(sum)

# # A tibble: 2 x 7
# # Groups:   segments, actor [2]
#   segments                 actor  year TL_Brexit Bre_negative TL_participation TD_other
#   <chr>                    <chr> <dbl>     <dbl>        <dbl>            <dbl>    <dbl>
# 1 Brexit is bad            SNP    2015         1            1                0        0
# 2 We need a sit on the ta~ Labo~  2017         0            0                1        1
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38