I have a huge query with a lot of JOIN
s. It is producing duplicates.
I am using this technique below that I found here on SO to identify which table the duplicates come from:
SELECT
TableA = '----------', TableA.*,
TableB = '----------', TableB.*
FROM ...
Here is an example of the data:
TABLE_A USER_ID TABLE_B LOCATION USER_CODE LOCATION_CODE TABLE_C SCI_YEAR_CODE
USER 1092993811 COL_PATHS_SCIENCE_ED University Of N. Maryland NULL ND BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED University Of N. Maryland NULL ND BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED University Of N. Maryland NULL ND BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED University Of N. Maryland NULL ND BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED California of College NULL MH BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED California of College NULL MH BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED California of College NULL MH BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED California of College NULL MH BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2016_AAB
USER 1092993811 COL_PATHS_SCIENCE_ED New York City Tech NULL BS BIO_PATHS_SCIENCE_RESEARCH 2017_RRT
You can see the table columns causing the most duplicates come from TABLE_C
, BIO_PATHS_SCIENCE_RESEARCH
.
For the SCI_YEAR_CODE
, I just need to get the most recent date and only need the SCI_YEAR_CODE
that ends with RRT
Is there a way to "weed" these duplicates out?
Thanks!