0

I have a data frame:

data_source <-c("s1", "s1", "s1", "s1", "s1", "s1")
lake <-c("blue lake", "blue lake", "blue lake", "mirror lake", "mirror lake", 
         "mirror lake")
sample <-c("upper", "mid", "bottom", "upper", "mid", "bottom")
df <-data.frame(cbind(data_source, lake, sample))
df

I have constructed the following id columns:

df$source_id <-"s1"
df$lake_id <-paste(df$source_id, as.numeric(df$lake), sep = "_")
df$sample_id <-paste(df$lake_id, as.numeric(as.factor(df$lake_id)), sep = "_")

However, I want the sample_id column to look like this:

df$desired_id <-c("s1_l1_1", "s1_l1_2", "s1_l1_3", "s1_l2_1", "s1_l2_2", "s1_l2_3")

I can't figure out how to calculate a cumulative sample number by lake. Thanks in advance!

Erba Aitbayev
  • 4,167
  • 12
  • 46
  • 81
user3791234
  • 141
  • 1
  • 9

1 Answers1

0

The issue is that you never ask for the "l" before the lake number in the lake_id column and that gets passed to the sample_id column. Using code similar to what you're using, I'd do the following:

df$lake_id <- with(df, paste0(source_id, "_l", as.numeric(df$lake)))
# "s1_l1" "s1_l1" "s1_l1" "s1_l2" "s1_l2" "s1_l2"

This is different that now the lake_id contains an "l" before the lake number.

Now, your code will have the desired sample_id by running the rest of your code.

My preferred coding style when putting together multiple items like this is to use sprintf like this:

df$sample_id <- with(df, sprintf("%s_l%d_%d", source_id, as.numeric(lake), as.numeric(lake)))

With that coding style, it is a bit easier to see what is used and how it is connected-- with the added complexity of using sprintf formatting strings.

Bill Denney
  • 766
  • 1
  • 6
  • 21