-1

I'm using R/Quanteda and I'm trying to make a wordcloud from ONLY upper case words. The txt is a from a bibliographic reference in ABNT format, doing so I would keep only the authors surnames. Any hint? Tanks!

  • Hi. Welcome to S.O! Please take the [tour](https://stackoverflow.com/tour) if you haven't already. This question would be greatly helped by a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), which is generally what we expect on this site. Learn more about how to make these with R [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-exampl). – Captain Hat Jul 02 '21 at 09:54

1 Answers1

1

Base R

string <- "lowercase UPPERCASE more lower case UPPER  1143 + 40 = !!!"

gsub(" {2,}", " ", # replace 2 or more consecutive spaces with one space
  gsub("[^A-Z ]", "", string) # remove anything that's not a space or an uppercase letter
)
#> [1] " UPPERCASE UPPER "

Created on 2021-07-02 by the reprex package (v2.0.0)

Stringr Package

require(stringr)
#> Loading required package: stringr

string <- "lowercase UPPERCASE more lower case UPPER  1143 + 40 = !!!"

str_squish( # remove excess whitespace
  str_remove_all(string, "[^[:UPPER:] ]") #remove everything except uppecase and spaces
)
#> [1] "UPPERCASE UPPER"

Created on 2021-07-02 by the reprex package (v2.0.0)

Captain Hat
  • 2,444
  • 1
  • 14
  • 31