R: Frequency table that is case insensitive

Question

Here is one column of my df: [df$City]
(I have other columns, but I'm just showing one column for simplicity.)

City        
Seattle     
San Diego   
Bern       
SEATTLE
SEATTLE
BERN

I want to do a frequency count on the cities. I want both "Seattle" and "SEATTLE" to be considered the same - basically, I want the frequency table calculation to be case insensitive.

If I use table(df) it gives me "Seattle" and "SEATTLE" as two different items. I tried to overcome this by using toupper(df) before doing table(df)

However, I get the error: invalid multibyte string.

I checked the encoding of my file and it seems to be UTF-8 - I could be wrong - is there a way for me to check the encoding?

Does anyone know how I can get a frequency table that is case insensitive? It doesn't have to be using my approach.

Thanks in advance!!

@eipi10, thanks for the answer. However, this doesn't work. It gives me the error: "invalid multibyte string" — user4918087, Jun 01 '15 at 16:51
I just noticed that the column you posted is actually called `City`. What is `alpha`? — eipi10, Jun 01 '15 at 16:54
@eipi10, I apologize - it was a typo on my part. It should be City - I have changed it accordingly. — user4918087, Jun 01 '15 at 17:02
The answer by @eipi10 (with the correct column name) should work. If it doesn't, you should post `dput(head(df))` so we can see your real data. — Molx, Jun 01 '15 at 17:03

score 3 · Answer 1 · answered Jun 01 '15 at 16:59

3

You'll want to look into iconv() for the UTF-8 conversion. Also, with the strings, you will probably have to use toupper() or tolower() to standardize them, and maybe stringr::str_trim() to take care of extra white-space...

answered Jun 01 '15 at 16:59

cory

6,529
3
21
41

1

Worth mentioning [this](http://stackoverflow.com/questions/4993837/r-invalid-multibyte-string) post which goes into some reasons why the `invalid multibyte string` error could come up – MichaelChirico Jun 02 '15 at 20:51

R: Frequency table that is case insensitive

1 Answers1