0

I am fairly new to R and am currently working with a fairly large dataframe. Basically what I am trying to do is turn something like this:

   Year Sample Species Catch
1  2016      1       a     9
2  2016      1       b     5
3  2016      1       c    13
4  2016      1       d     2
5  2016      1       e     4
6  2016      1       f    13
7  2016      2       a     7
8  2016      2       c     5
9  2016      2       f     6
10 2016      2       g     2

into this:

   Year Sample Species Catch
1  2016      1       a     9
2  2016      1       b     5
3  2016      1       c    13
4  2016      1       d     2
5  2016      1       e     4
6  2016      1       f    13
7  2016      1       g     0
8  2016      1       h     0
9  2016      1       i     0
10 2016      1       j     0
11 2016      1       k     0
12 2016      2       a     7
13 2016      2       b     0
14 2016      2       c     5
15 2016      2       d     0
16 2016      2       e     0
17 2016      2       f     6
18 2016      2       g     2
19 2016      2       h     0
20 2016      2       i     0
21 2016      2       j     0
22 2016      2       k     0

That is, there is a set number of species (a through k), and where there is no record of that species in the "Sample" I want to have a record showing 0.

Thanks!

Tony1990
  • 3
  • 1
  • 1
    Check out `tidyr:complete` as shown in answers in the linked duplicated. There is a `fill` argument that allows you to use 0 as your filling value instead of NA. – aosmith Sep 16 '16 at 19:35
  • http://stackoverflow.com/questions/10438969/fastest-way-to-add-rows-for-missing-values-in-a-data-frame , http://stackoverflow.com/questions/18780918/add-row-to-dataframe-based-on-presence-criteria , http://stackoverflow.com/questions/28073752/r-how-to-add-rows-for-missing-values-for-unique-group-sequences , http://stackoverflow.com/questions/31150028/insert-missing-time-rows-into-a-dataframe – user20650 Sep 16 '16 at 21:28

1 Answers1

1

How about this?

all.species <- c('a','b', 'c','d','e','f','g','h','i','j','k')
samples <- split(df, df$Sample)
new.df <- NULL
for (sample in samples) {
  missing.species <- setdiff(all.species, unique(sample$Species))
  sample <- rbind(sample, data.frame(Year=unique(sample$Year), 
                                     Sample=unique(sample$Sample), 
                                     Species=missing.species, Catch=0))
  new.df <- rbind(new.df, sample[order(sample$Species),])
}
new.df

with output

Year Sample Species Catch
1   2016      1       a     9
2   2016      1       b     5
3   2016      1       c    13
4   2016      1       d     2
5   2016      1       e     4
6   2016      1       f    13
7   2016      1       g     0
8   2016      1       h     0
9   2016      1       i     0
10  2016      1       j     0
11  2016      1       k     0
72  2016      2       a     7
51  2016      2       b     0
82  2016      2       c     5
61  2016      2       d     0
71  2016      2       e     0
92  2016      2       f     6
102 2016      2       g     2
81  2016      2       h     0
91  2016      2       i     0
101 2016      2       j     0
111 2016      2       k     0
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63