R considering the same values of factor as different values

Question

I'm a new user of R, trying to do subsetting to one of my column. However some of the values are missing / not subsetted to the new subset.

I tried different variations of spelling of the code but it does not seem to work (i.e. :)

df_Location = df[df$Location == "Samarinda" | df$Location == "Samarinda " df$Location == "Samarinda. " df$Location == " Samarinda",]
df_Location
summary(df)

df_Location = df[df$Location == "Samarinda",]
df_Location
summary(df)

df_Location = df[df$Location == "Samarinda",]
df_Location
summary(df)

These codes only resulted in a subset of 7 rows --> There should be 37 rows in the original data

When I used rPivotTable, this is what it shows (Samarinda is listed twice, with values of 30 and 7, respectively):

Samarinda   30
Samarinda   7
Totals  221

Can anyone advise on how to fix this problem?

Thank you very much

When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What does `dput(unique(df$Location))` return? — MrFlick, Nov 07 '19 at 04:16

score 0 · Answer 1 · answered Nov 07 '19 at 04:31

If you're certain that the differences come from extraneous characters at the edges of the string, a quick way to get what you want would be to filter to rows where df$Location contains "Samarinda" anywhere:

df_Location = df[grepl("Samarinda", df$Location),]

If you need to be certain exactly why the values are different, a quick hack to find leading/trailing spaces is

unique(paste("X", df$Location, "X", sep = ""))

score 0 · Answer 2 · answered Nov 07 '19 at 05:30

0

An alternative to grepping could be running the strings through trimws, like so:

df_Location = df[trimws(df$Location) == "Samarinda",]

answered Nov 07 '19 at 05:30

iod

7,412
2
17
36

R considering the same values of factor as different values

2 Answers2