Consider a data frame with two numeric columns and a categorical column containing a string:
d1 <- data.frame(x = c(0, 1, 2, 5, 6.5, 8), y = c(0, 2, 3, 5, 5.5, 5), category = "ValueA")
d2 <- data.frame(x = c(0, 1, 2, 4, 6, 8), y = c(0, 3, 3.5, 4, 4, 5), category = "ValueB")
df <- rbind(d1, d2)
> df
x y category
1 0.0 0.0 ValueA
2 1.0 2.0 ValueA
3 2.0 3.0 ValueA
4 5.0 5.0 ValueA
5 6.5 5.5 ValueA
6 8.0 5.0 ValueA
7 0.0 0.0 ValueB
8 1.0 3.0 ValueB
9 2.0 3.5 ValueB
10 4.0 4.0 ValueB
11 6.0 4.0 ValueB
12 8.0 5.0 ValueB
I want to append a number (as a prefix) to the values of the category
column, which is sequentially increasing for different categorical values ("ValueA", "ValueB", ...
).
My take using dplyr
:
library(dplyr)
diff <- unique(df$category)
for(i in 1:length(diff)) {
if(i == 1) {
results.df <- subset(df, category == diff[i]) %>% mutate(category = paste0(as.character(i), sep = ".", diff[i]))
}
else {
appender.df <- subset(df, category == diff[i]) %>% mutate(category = paste0(as.character(i), sep = ".", diff[i]))
results.df <- rbind(results.df, appender.df)
}
}
> results.df
x y category
1 0.0 0.0 1.ValueA
2 1.0 2.0 1.ValueA
3 2.0 3.0 1.ValueA
4 5.0 5.0 1.ValueA
5 6.5 5.5 1.ValueA
6 8.0 5.0 1.ValueA
7 0.0 0.0 2.ValueB
8 1.0 3.0 2.ValueB
9 2.0 3.5 2.ValueB
10 4.0 4.0 2.ValueB
11 6.0 4.0 2.ValueB
12 8.0 5.0 2.ValueB
This works fine, but are there any better approaches? Making a data.frame
for each distinct category
string (like I'm doing within my loop) seems overkill, especially when I would be dealing with a large number of unique values in category
. (I'm using two here for a minimal example!)
I'm pretty sure there are better ways to modify the string values directly (perhaps operations within the data frame?), but I'm lacking this knowledge. Any answers/pointers would be appreciated!