0

I have a dataframe with phonetic transcriptions of words called trans, and a column pos_numwhich records the position of the phoneme tin the transcription strings.

df <- data.frame(
  trans = c("ðət", "əˈpærəntli", "ˈkɒntrækt", "təˈwɔːdz", "pəˈteɪtəʊz"), stringsAsFactors = F
)
df$pos_num <- sapply(strsplit(df$trans, ""), function(x) which(grepl("t", x)))

df
       trans pos_num
1        ðət       3
2 əˈpærəntli       8
3  ˈkɒntrækt    5, 9
4   təˈwɔːdz       1
5 pəˈteɪtəʊz    4, 7

In some transcriptions, t occurs more than once, resulting in multiple values in pos_num. Where this is the case I would like to duplicate the entire row, with the original row containing one value and the duplicated row containing the other value. The desired output would be:

df
       trans pos_num
1        ðət       3
2 əˈpærəntli       8
3  ˈkɒntrækt       5
4  ˈkɒntrækt       9
5   təˈwɔːdz       1
6 pəˈteɪtəʊz       4
7 pəˈteɪtəʊz       7

How can this be achieved? (There seem to be a few posts on that question for other programming languages but not R.)

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34

1 Answers1

1
library(data.table)
setDT(df)
df[, .(pos_num = unlist((pos_num))),by = .(trans)]
Vasily A
  • 8,256
  • 10
  • 42
  • 76