I have the following data set (sample):
emplid | Citizeship |
100001 | USA |
100001 | CAN |
100001 | CHN |
100002 | USA |
100002 | CHN |
100003 | USA |
Is there a way to transform the data into the following:
emplid | Citizeship_1 | Citizenship_2 | Citizenship_3
100001 | USA | CHN | CAN
100002 | USA | CHN |
100003 | USA | |
The assumption is that each emplid
will have up to 4 citizenships.
I started with the following codes, but for the emplid
s who just have 1 citizenship
, the value is being repeated in the citizenship_2
, citizenship_3
, which should be just blank:
select *
, substring_index(Citizenship_multiple, ',', 1) as Citizenship_1
, substring_index(substring_index(Citizenship_multiple,',',-1),',',1) as Citizenship_2
, substring_index(substring_index(Citizenship_multiple,',',-2),',',1) as Citizenship_3
, substring_index(substring_index(Citizenship_multiple,',',-3),',',1) as Citizenship_4
from
(select *
, group_concat(distinct Citizenship) as Citizenship_multiple
from `citizenship_csv_meta`
group by emplid) a