Some sample data, from your image (please don't post images of data!):
data1 <- data.frame(imageLikeCount = c(3,27,4,4,16,103),
userName = c("\"testblabla\"", "test_00", "frenchfries", "frenchfries", "test.inc", "\"parmezan_pizza\""))
str(data1)
# 'data.frame': 6 obs. of 2 variables:
# $ imageLikeCount: num 3 27 4 4 16 103
# $ userName : Factor w/ 5 levels "\"parmezan_pizza\"",..: 2 5 3 3 4 1
To fix the problem with factors as well as the embedded quotes:
data1$userName <- gsub('"', '', as.character(data1$userName))
str(data1)
# 'data.frame': 6 obs. of 2 variables:
# $ imageLikeCount: num 3 27 4 4 16 103
# $ userName : chr "testblabla" "test_00" "frenchfries" "frenchfries" ...
Like @DanielWinkler suggested, if you can change how the data is read-in or defined, you might choose to include stringsAsFactors = FALSE
(this argument is accepted in many functions, including read.csv
, read.table
, and most data.frame
functions including as.data.frame
and rbind
):
data1 <- data.frame(imageLikeCount = c(3,27,4,4,16,103),
userName = c("\"testblabla\"", "test_00", "frenchfries", "frenchfries", "test.inc", "\"parmezan_pizza\""),
stringsAsFactors = FALSE)
str(data1)
# 'data.frame': 6 obs. of 2 variables:
# $ imageLikeCount: num 3 27 4 4 16 103
# $ userName : chr "\"testblabla\"" "test_00" "frenchfries" "frenchfries" ...
(Note that this still has embedded quotes, so you'll still need something like data1$userName <- gsub('"', '', data1$userName)
.)
Now, we have data that looks like this:
data1
# imageLikeCount userName
# 1 3 testblabla
# 2 27 test_00
# 3 4 frenchfries
# 4 4 frenchfries
# 5 16 test.inc
# 6 103 parmezan_pizza
and your need to remove duplicates works:
data1[! duplicated(data1), ]
# imageLikeCount userName
# 1 3 testblabla
# 2 27 test_00
# 3 4 frenchfries
# 5 16 test.inc
# 6 103 parmezan_pizza