-1

There are two columns in my dataset. It contains 33000 rows (huge). column 1 is called "Surname" column 2 is called "nickname"

I need to find out how many peoples surname is exactly the same as their nickname. can anyone find me a function for this in R??

  • Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Aug 31 '20 at 14:53

3 Answers3

1

In your case, you can just simply create an logical test of equality between the two columns. After that, if you sum the logical values that result from this test, you get the number of TRUE's, or the number of rows, that have the same surname/nickname.

tab <- data.frame(
  nickname = sample(c("Ana", "Tese", "Maker"), size = 20, replace = TRUE),
  surname = sample(c("Ana", "Ed", "Philip"), size = 20, replace = TRUE)
)

tab$test <- tab$nickname == tab$surname

sum(tab$test)
Pedro Faria
  • 707
  • 3
  • 7
0

Fàîžà!

My solution involves creating a new column in your dataframe which indicates TRUE if the surname and nickname are exactly the same and FALSE if they are not exactly the same.

To do this, you need the dplyr package:

surname <- c("Smith", "Potter", "Smith") 
nickname <- c("Bobby", "Potter", "Smith")
df <- data.frame(surname = x, nickname = y)

Now that we have the dataframe, let's add the dplyr code:

library(dplyr)
df <- df %>% 
  mutate(equal_names = case_when(
    surname == nickname ~ TRUE, 
    surname != nickname ~ FALSE))

The result is:

> df
  surname nickname equal_names
1   Smith    Bobby       FALSE
2  Potter   Potter        TRUE
3   Smith    Smith        TRUE

case_when() returns whatever you want after the specified condition.

If you want more advanced screening, you'd need to check how regular expressions work. This post has a few hints about this.

OTStats
  • 1,820
  • 1
  • 13
  • 22
0

A simple base R like below might work

sum(do.call("==",df))

Example

df <- structure(list(surname = c("A", "C", "A", "B", "A", "C", "C", 
"B", "B", "C"), nickname = c("C", "A", "A", "A", "B", "B", "B", 
"B", "C", "A")), class = "data.frame", row.names = c(NA, -10L
))

> df
   surname nickname
1        A        C
2        C        A
3        A        A
4        B        A
5        A        B
6        C        B
7        C        B
8        B        B
9        B        C
10       C        A

> sum(do.call("==",df))
[1] 2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81