I have three datasets having the same columns name but different data.
Dataset 1
Date Full Name Texts Country Read
02/05/2017 Jack Smith Grey ranks... UK True
02/05/2017 Mary Jane Once upon a time France False
...
24/06/2019 Luke Martyn Bottled... USA True
Similarly for the other dataset 2
Date Full Name Texts Country Read
02/05/2017 Jack Smith The story ... UK True
04/05/2017 Mary Jane She saw ... France False
...
12/03/2019 Chris Martyn From Mexico ... Mexico False
and the dataset 3.
Date Full Name Texts Country Read
23/02/2017 Jack Smith They said ... UK True
01/05/2017 Martin O'Connell When I was young ... Ireland True
...
14/07/2018 Laura Cross My family ... Italy True
I would like to get the following information from these datasets:
- unique/distinct names, i.e. names that are not included in one or more datasets (but in this case I would like to know if it would be possible to know in which dataset they are included/missing)
- common names, i.e. names that are in all the three datasets or in two of them (same as above).
Some people are in all three datasets; some others are not. Of course, if people are the same, I have same information regarding the Country, but different texts. I should probably join the three datasets to see which unique names are in all the three datasets and where they differ. And, yes, probably it would be necessary to create a new column to get information about the dataset that include or it does not the name.
I know how to join the three datasets without any condition.
However, I would need to get the information above based on a condition, for example when Read = True
.