Matching and Adding Factor Counts in R Data Frames

Question

My question stems out of this and this question asked previously and sufficiently different from them I think. Imagine that I have a minimal dataset (bird) where every row represents an observation of birds at a given time and place as follows:

id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null

What I ideally want is a conversion of this dataset into a form like this removing the nulls from the dataset:

id,tot_obs,country_tot
A,12,2
B,3,1
C,5,2

I know that I can get a count of factors using:

table(bird$country)

but, is there a smarter, perhaps, one line way of removing the nulls, adding up the total counts, finding the counts of the countries and then reconfiguring them into this form? If there is a package which does this, then I am open to that suggestion as well. Thanks !

Shouldn't the `tot_obs` for first row be 12? – Arun Mar 20 '13 at 22:18 — Arun, Mar 20 '13 at 22:18
Apologies. It was a typo. I have edited it now. – Shion Mar 20 '13 at 22:20 — Shion, Mar 20 '13 at 22:20
Shion, yes, in that case, the answer below should do it. – Arun Mar 20 '13 at 22:21 — Arun, Mar 20 '13 at 22:21

Arun · Accepted Answer · 2013-03-21T06:58:28.597

2

Load data with stringsAsFactors=FALSE:

df <- read.csv(header=TRUE, text="id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null", stringsAsFactors=FALSE)

# check to see if columns are factors
sapply(df, class)
#          id         obs     country 
# "character"   "integer" "character"

Remove all rows with country = null

df <- df[df$country != "null", ]

Then you can use plyr package with summarise to get the desired result as follows:

ddply(df, .(id), summarise, tot_obs=sum(obs), tot_country=length(unique(country)))
#   id tot_obs tot_country
# 1  A      12           2
# 2  B       3           1
# 3  C       5           2

edited Mar 21 '13 at 06:58

answered Mar 20 '13 at 22:16

Arun

116,683
26
284
387

I tried out the exact code as above. It gave me an error: `Error in Summary.factor(c(45L, 3L, 165L, 3L, 165L, 3L, 70L, 165L, 165L, : sum not meaningful for factors` – Shion Mar 21 '13 at 01:41
yes. make sure your columns are not factors. load your csv file with `stringsAsFactors=FALSE` if need be. – Arun Mar 21 '13 at 06:48
Perfect ! Thank you so much for your help. It all works well now. – Shion Mar 21 '13 at 17:17

Matching and Adding Factor Counts in R Data Frames

1 Answers1

Linked