I need to count Japanese characters percentages of every sentences in R. I split text into sentences and it looks like below :
> text
[1] "若い人が仕事がつまらない会社が面白くないというのはなぜか"
[2] "それは要するに自分のやることを人が与えてくれると思っているからです"
[3] "でも会社が自分にあった仕事をくれるわけではありません"
I want to get number of hiragana characters in each sentence. I have a txt file to search hiragana characters in it. I can do it for a single sentence but can`t apply to all sentences. For one sentence I do it like this :
> hiragana<-scan("hiragana.txt",what="char")
> hiragana<-unlist(strsplit(hiragana,"")) #hiragana list to search in sentences
> b<-text[3]
> b<-unlist(strsplit(b,"")) # so that I can search characters in the sentence
> b
[1] "若" "い" "人" "が" "仕" "事" "が" "つ" "ま" "ら" "な" "い" "会" "社"
[15] "が" "面" "白" "く" "な" "い" "と" "い" "う" "の" "は" "な" "ぜ" "か"
> b[(b %in% hiragana)]
[1] "い" "が" "が" "つ" "ま" "ら" "な" "い" "が" "く" "な" "い" "と" "い"
[15] "う" "の" "は" "な" "ぜ" "か"
> length(b[(b %in% hiragana)])
[1] 20
My question is how can I make it work for more than one sentences. I need an output like this :
>output
[1] 20
[2] 28
[3] 20
My problem is similar to this but i want to apply this to each sentences, not a specific one.
Any opinions?