Construct a unique multi-part id with cummulative numbering by group

Question

I have a data frame:

data_source <-c("s1", "s1", "s1", "s1", "s1", "s1")
lake <-c("blue lake", "blue lake", "blue lake", "mirror lake", "mirror lake", 
         "mirror lake")
sample <-c("upper", "mid", "bottom", "upper", "mid", "bottom")
df <-data.frame(cbind(data_source, lake, sample))
df

I have constructed the following id columns:

df$source_id <-"s1"
df$lake_id <-paste(df$source_id, as.numeric(df$lake), sep = "_")
df$sample_id <-paste(df$lake_id, as.numeric(as.factor(df$lake_id)), sep = "_")

However, I want the sample_id column to look like this:

df$desired_id <-c("s1_l1_1", "s1_l1_2", "s1_l1_3", "s1_l2_1", "s1_l2_2", "s1_l2_3")

I can't figure out how to calculate a cumulative sample number by lake. Thanks in advance!

I think this is one solution: http://stackoverflow.com/questions/10029235/cumulative-count-in-r — user3791234, Jan 31 '16 at 02:04

score 0 · Answer 1 · answered Jan 31 '16 at 02:12

The issue is that you never ask for the "l" before the lake number in the lake_id column and that gets passed to the sample_id column. Using code similar to what you're using, I'd do the following:

df$lake_id <- with(df, paste0(source_id, "_l", as.numeric(df$lake)))
# "s1_l1" "s1_l1" "s1_l1" "s1_l2" "s1_l2" "s1_l2"

This is different that now the lake_id contains an "l" before the lake number.

Now, your code will have the desired sample_id by running the rest of your code.

My preferred coding style when putting together multiple items like this is to use sprintf like this:

df$sample_id <- with(df, sprintf("%s_l%d_%d", source_id, as.numeric(lake), as.numeric(lake)))

With that coding style, it is a bit easier to see what is used and how it is connected-- with the added complexity of using sprintf formatting strings.

Construct a unique multi-part id with cummulative numbering by group

1 Answers1