1

I would be really grateful for any advice on creating a 'loop' or 'function': My goal is essentially manual stemming of a text string - amending several related terms into one term.

My code to do it individually works absolutely fine, but it would save me so much time if I could iterate it.

# dataframe of the collection of terms to be substituted into one term
Babanov_stem <- c("бабановдун", "бабановду", "бабановтун", "бабанову",  "бабановту", "бабановго", "бабановко", "бабановым", "бабановдон", "бабановтон",
           "бабанове", "бабановто", "babanova", "babanov", "babanovpresident",
           "бабанова")

Babanov_seq <- seq(1:16)

Babanov <- data_frame(Babanov_seq, Babanov_stem)

# single code works fine
tidy_KG17pre$word2 <- str_replace_all(tidy_KG17pre$word2, Babanov$Babanov_stem[15], "бабанов")

The individual code works great, but I would really like to iterate - as I have to do this for approximately 25 terms but across 5 candidates (Babanov is candidate 1)

# My poor effort at a for loop
for (i in seq(Babanov$Babanov_stem)){
tidy_KG17pre$word2 <- str_replace_all(tidy_KG17pre$word, Babanov_stem[i], "бабанов")
}

# My effort at Functional Programming appears to be a bit weak too
library(purrr)
tidy_KG17pre$word2 <- tidy_KG17pre$word %>% 
map(str_replace_all, Babanov$Babanov_stem, "бабанов") %>% 
reduce(append)

I would be really grateful for any thoughts on how to get any of the above to work :)

Robert Chestnutt
  • 302
  • 3
  • 13
  • 1
    Please provide a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example – Val Mar 20 '18 at 11:21
  • Thank you so much for responding Val and Paul - Is there a way to send the dataframe (its an unnested tidy format dataframe)? – Robert Chestnutt Mar 20 '18 at 11:34
  • Read the link provided by Val, you can use `dput`, or place it on a github gist if it's too big. – Paul Rougieux Mar 20 '18 at 11:37
  • Thanks so much Paul and Val, sorry Im very much a novice - the dataset is on github - rfche704/Kyrgyz-tidy-Dataset Also there are 2 dataset, the strings and the tidy version. I have been editing the tidy version – Robert Chestnutt Mar 20 '18 at 12:04

2 Answers2

1

I created a fake dataset to play with.

dtf <- data_frame(word = paste(Babanov_stem, "blabla"))
head(dtf)
# # A tibble: 6 x 1
#                word
#               <chr>
# 1 бабановдун blabla
# 2  бабановду blabla
# 3 бабановтун blabla
# 4   бабанову blabla
# 5  бабановту blabla
# 6  бабановго blabla

Replacing a single code as you suggested

dtf$word <- str_replace_all(dtf$word, Babanov$Babanov_stem[15], "бабанов")

Using a loop to replace any word in Babanov_stem by the word "бабанов"

for (w in Babanov$Babanov_stem){
    dtf$word <- str_replace_all(dtf$word, w, "бабанов")
}
head(dtf)
# # A tibble: 6 x 1
#             word
#            <chr>
# 1 бабанов blabla
# 2 бабанов blabla
# 3 бабанов blabla
# 4 бабанов blabla
# 5 бабанов blabla
# 6 бабанов blabla

Note: You don't need seq() in the for loop.

The loop above uses modification in place. It may be a case where functional programming is not recommended. See Loops that should be left as is in Hadley wickham's book on Advanced R programming.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
0

Thank you Paul for your help. I finally figured it out. The best way to do it is to use 'stringr' to adapt the common function used to extract URL's in text mining. That command takes a string with 'www.' in it and extracts it replacing it with a 'space'

I did the same, but instead of the 'www.' I used the stem of the Candidates name. It was fine in latin alphabet or cyrillic

str_replace_all(KG17$message, "бабан[^[:blank:]]+", "babanov")
Robert Chestnutt
  • 302
  • 3
  • 13