0

Sample file Imdb sample

In the film data set I have same titles available: 'A star is born' aka 'Narodziny gwiazdy' - four times, 'Halloween' - 3 times. These are different movies as released in different years. How to filter only these titles which are present multiple times and display the details for them?

(titleDetails <- imdb_movies.csv %>%
  group_by(Title) %>%
  summarise(count = n()) %>%
  filter(count > 2))

titleDetails 

Code above will display only title and count.

How to display all details which I have in the data set?

alistaire
  • 42,459
  • 4
  • 77
  • 117
Supek
  • 47
  • 1
  • 7

1 Answers1

0

You can call df[duplicated(df$Title) | duplicated(df$Title, fromLast = T), ].

duplicated(df$Title) returns a logical vector with TRUEs for all rows with a duplicated title. The first occurrence of the duplicated title will show as FALSE.

duplicated(df$Title, fromLast = TRUE) does the same thing, except in reverse order. This time, from the standpoint of the data you've supplied, the last occurrence of the duplicated title is marked FALSE.

Then, you can get all of the rows with duplicated titles by using the | (or) operator on these two duplicated() calls and index your original data using the resulting logical vector.

Benjamin Ye
  • 508
  • 2
  • 7