I'm using R/Quanteda and I'm trying to make a wordcloud from ONLY upper case words. The txt is a from a bibliographic reference in ABNT format, doing so I would keep only the authors surnames. Any hint? Tanks!
Asked
Active
Viewed 222 times
-1
-
Hi. Welcome to S.O! Please take the [tour](https://stackoverflow.com/tour) if you haven't already. This question would be greatly helped by a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), which is generally what we expect on this site. Learn more about how to make these with R [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-exampl). – Captain Hat Jul 02 '21 at 09:54
1 Answers
1
Base R
string <- "lowercase UPPERCASE more lower case UPPER 1143 + 40 = !!!"
gsub(" {2,}", " ", # replace 2 or more consecutive spaces with one space
gsub("[^A-Z ]", "", string) # remove anything that's not a space or an uppercase letter
)
#> [1] " UPPERCASE UPPER "
Created on 2021-07-02 by the reprex package (v2.0.0)
Stringr Package
require(stringr)
#> Loading required package: stringr
string <- "lowercase UPPERCASE more lower case UPPER 1143 + 40 = !!!"
str_squish( # remove excess whitespace
str_remove_all(string, "[^[:UPPER:] ]") #remove everything except uppecase and spaces
)
#> [1] "UPPERCASE UPPER"
Created on 2021-07-02 by the reprex package (v2.0.0)

Captain Hat
- 2,444
- 1
- 14
- 31