I am calculating the number of possible words given a list of strings of syllable combinations. The syllable combination list looks like this:
syllable_combinations <- c("C", "CC", "CCCV-CCV", "CCCV-CCV-CV", "CCCV-CV-CCV", "CCCV-CCV-CCV-CV", "CCCV-CC-CV", "CCCV-CCV-C", "CCCV-CV", "CV-C-CCCV")
On the basis of this list, I'd like to calculate the number of possible words in English given phonotactic rules. To do this, I need to go through the individual items in the syllable combinations list and calculate the number of possible words given that syllable syllable combination.
To generate the number of possible words for a given syllable combination, I need to go through the syllable combination and look at each character in turn in relation to its environment. For the first syllable combination, for instance, I need to do the following:
- identify that this word starts with a single consonant C (rather than 2 or 3 consonants);
- identify that this first single consonant is followed by a vowel V;
- identify that the word continues with a next syllable (indicated by the hyphen);
- identify that this second syllable also starts with a single consonant C;
- and ends with another vowel V.
This information needs to be connected with information on the sounds that can appear in these positions:
number_of_vowels <- 20
number_of_initial_consonants_length_1 <- 22
number_of_initial_consonants_length_2 <- 47
number_of_final_consonants_length_1 <- 24
In order to calculate the number of possible words with "CVCV" syllable structure in English:
number_of_CVCV_words <- number_of_initial_consonants_length_1*number_of_vowels*number_of_initial_consonants_length_1*number_of_vowels
number_of_CVCV_words
193600
Any advice on how to do this?
I've gotten a bit further with this, but run into some problems.
First, split the syllable combinations into separate syllables:
split_syllables <- c()
for(i in 1:length(syllable_combinations)){
strsplit(as.character(syllable_combinations[i]), split = "-") -> split_syllable
split_syllables <- append(split_syllables, split_syllable)
}
Then, a function that can match each syllable (there is a limited number of unique syllables, so this is doable) (the counter1 variable gives the number of possible sound combinations in English given that particular syllable structure):
detect_syllables <- function(syllable){
if(syllable == "C") {
counter1 <- 25
} else if(syllable == "CC") {
counter1 <- 528
} else if(syllable == "CCCV") {
counter1 <- 200
} else if(syllable == "CCV") {
counter1 <- 940
} else if(syllable == "CV") {
counter1 <- 440
} else if(syllable == "CVC") {
counter1 <- 10560
} else
print(syllable, "syllable not matched")
}
Then, functions which carry out the detect_syllables function for each syllable in the orgininal syllable combination:
one_syllable <- function(first_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
first_syl -> number1
print(number1)
}
two_syllables <- function(first_syllable, second_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
first_syl*second_syl -> number2
print(number2)
}
three_syllables <- function(first_syllable, second_syllable, third_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
lapply(split_syllables[[i]][3], FUN = detect_syllables)
counter1 -> third_syl
first_syl*second_syl*third_syl -> number3
print(number3)
}
four_syllables <- function(first_syllable, second_syllable, third_syllable, fourth_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
lapply(split_syllables[[i]][3], FUN = detect_syllables)
counter1 -> third_syl
lapply(split_syllables[[i]][4], FUN = detect_syllables)
counter1 -> fourth_syl
first_syl*second_syl*third_syl*fourth_syl -> number4
print(number4)
}
And a for loop to make sure that the detect_syllables function is used the appropriately:
for(i in 1:10){
if(length(split_syllables[[i]]) == 1) {
lapply(split_syllables[[i]][1], FUN = one_syllable)
} else if(length(split_syllables[[i]]) == 2) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], FUN = two_syllables)
} else if(length(split_syllables[[i]]) == 3) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], split_syllables[[i]][3], FUN = three_syllables)
} else if(length(split_syllables[[i]]) == 4) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], split_syllables[[i]][3], split_syllables[[i]][4], FUN = four_syllables)
} else
print("number of syllables is bigger than 4")
}
However, when I try to use the for loop, I get the following error message:
Error in four_syllables(split_syllables[[1]]) : object 'counter1' not found
I realize this has to with the environment in which 'counter1' is evaluated, as mentioned here: Using get inside lapply, inside a function, but I don't know how to solve it. Neither of the lapply's seem to like it if I try to point them to the right environment (Error in FUN("C"[[1L]], ...) : unused argument(s)).
This required result can be obtained very ineleganty by not using lapply(). If someone has another solution, I'd be happy to learn about it.
for(i in 1:10){
if(length(split_syllables[[i]]) == 1) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
first_syl -> number1
print(number1)
} else if(length(split_syllables[[i]]) == 2) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
first_syl*second_syl -> number2
print(number2)
} else if(length(split_syllables[[i]]) == 3) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
detect_syllables(split_syllables[[i]][3]) -> counter1
counter1 -> third_syl
first_syl*second_syl*third_syl -> number3
print(number3)
} else if(length(split_syllables[[i]]) == 4) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
detect_syllables(split_syllables[[i]][3]) -> counter1
counter1 -> third_syl
detect_syllables(split_syllables[[i]][4]) -> counter1
counter1 -> fourth_syl
first_syl*second_syl*third_syl*fourth_syl -> number4
print(number4)
} else
print("number of syllables is bigger than 4")
}