Here is the head of my df, n = 40:
structure(list(Code = c("75", "75", "75", "75", "75", "75", "75",
"75", "75", "75", "75", "75", "75", "R009", "R009", "R009", "R009",
"R009", "R009", "R009", "R009", "R009", "R009", "R009", "R009",
"R009", "R015", "R015", "R015", "R015", "R015", "R015", "R015",
"R015", "R019", "R019", "R019", "R019", "R019", "R019"), Name = c("a",
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "a",
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "a",
"f", "g", "h", "i", "k", "l", "m", "a", "b", "c", "d", "e", "f"
), n = c(41L, 14L, 7L, 5L, 11L, 138L, 4L, 92L, 19L, 10L, 167L,
67L, 62L, 3L, 1L, 35L, 6L, 125L, 43L, 4L, 44L, 86L, 8L, 33L,
37L, 13L, 8L, 32L, 1L, 3L, 2L, 17L, 2L, 7L, 45L, 14L, 10L, 8L,
15L, 228L)), row.names = c(NA, -40L), groups = structure(list(
Code = c("75", "R009", "R015", "R019"), .rows = structure(list(
1:13, 14:26, 27:34, 35:40), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
And here is a screenshot:
I'm trying to turn that n value into the equivalent number of rows. So in this screenshot, I'd like Code == 75 and Name = "a" to be repeated as 41 rows in the dataframe.
The reason I'm trying to do this is because I want to see if there is a strong correlation between Code and Name. So after I have a long dataframe with lots of rows, I plan to use the cor function like this:
cor(df$Code, df$Name)
But because cor I think is going to give me a rejection because Name is not numeric I think I will first have to convert all the Names to numeric values.
df <- df %>%
mutate(Name = case_when(Name == "a" ~ 1,
Name == "b" ~ 2,
Name == "c" ~ 3,
Name == "d" ~ 4,
Name == "e" ~ 5,
Name == "f" ~ 6,
Name == "g" ~ 7,
Name == "h" ~ 8,
Name == "i" ~ 9,
Name == "j" ~ 10,
Name == "k" ~ 11,
Name == "l" ~ 12,
Name == "m" ~ 13))
How do I turn the n value in the dataframe into the equivalent number of rows?
And also, does this workflow make sense? Is there a shortcut to find correlation here besides turning the summary dataframe into more like "raw" data, then converting types to numeric values, then comparing two vectors?