-1

I have a dataset of two columns by 13893 rows as follows:

trip    species
120318  ADHJ
120918  FJIW
120918  ADHJ
180817  ADHJ
180817  FJIW
180817  FJIW
099217  ADHJ

I want to change it so that the species names become column headers and the count of each species per trip is calculated. The result would be as follows:

trip    ADHJ    FJIW
120318  1       0
120918  1       1
180817  1       2
099217  1       0
pogibas
  • 27,303
  • 19
  • 84
  • 117
DND
  • 113
  • 2
  • 7
  • 1
    Good old `reshape2` can do this in one line `reshape2::dcast(dataset, trip ~ species, length)`. There should be a duplicate somewhere. – pogibas Feb 25 '19 at 06:23

1 Answers1

1

Data:

a <- read.table(header=T, stringsAsFactors=F, text="
trip    species
120318  ADHJ
120918  FJIW
120918  ADHJ
180817  ADHJ
180817  FJIW
180817  FJIW
099217  ADHJ")

Base R

table(a$trip, a$species)
#         
#          ADHJ FJIW
#   99217     1    0
#   120318    1    0
#   120918    1    1
#   180817    1    2
xtabs(~ trip + species, data = a)
#         species
# trip     ADHJ FJIW
#   99217     1    0
#   120318    1    0
#   120918    1    1
#   180817    1    2

dplyr

library(dplyr)
library(tidyr)
a %>%
  group_by(trip, species) %>%
  tally() %>%
  spread(species, n, fill = 0)
# # A tibble: 4 x 3
# # Groups:   trip [4]
#     trip  ADHJ  FJIW
#    <int> <dbl> <dbl>
# 1  99217     1     0
# 2 120318     1     0
# 3 120918     1     1
# 4 180817     1     2

data.table

library(data.table)
aDT <- as.data.table(a)
dcast(aDT, trip ~ species, fill = 0)
# Using 'species' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#      trip ADHJ FJIW
# 1:  99217    1    0
# 2: 120318    1    0
# 3: 120918    1    1
# 4: 180817    1    2
r2evans
  • 141,215
  • 6
  • 77
  • 149