0

I'm a new user of R, trying to do subsetting to one of my column. However some of the values are missing / not subsetted to the new subset.

I tried different variations of spelling of the code but it does not seem to work (i.e. :)

df_Location = df[df$Location == "Samarinda" | df$Location == "Samarinda " df$Location == "Samarinda. " df$Location == " Samarinda",]
df_Location
summary(df)

df_Location = df[df$Location == "Samarinda",]
df_Location
summary(df)

df_Location = df[df$Location == "Samarinda",]
df_Location
summary(df)

These codes only resulted in a subset of 7 rows --> There should be 37 rows in the original data

When I used rPivotTable, this is what it shows (Samarinda is listed twice, with values of 30 and 7, respectively):

Samarinda   30
Samarinda   7
Totals  221

Can anyone advise on how to fix this problem?

Thank you very much

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 3
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What does `dput(unique(df$Location))` return? – MrFlick Nov 07 '19 at 04:16

2 Answers2

0

If you're certain that the differences come from extraneous characters at the edges of the string, a quick way to get what you want would be to filter to rows where df$Location contains "Samarinda" anywhere:

df_Location = df[grepl("Samarinda", df$Location),]

If you need to be certain exactly why the values are different, a quick hack to find leading/trailing spaces is

unique(paste("X", df$Location, "X", sep = ""))
A. S. K.
  • 2,504
  • 13
  • 22
0

An alternative to grepping could be running the strings through trimws, like so:

df_Location = df[trimws(df$Location) == "Samarinda",]
iod
  • 7,412
  • 2
  • 17
  • 36