This is response from one sql query in pandas. I want to concatenate, "Label column", together based on Issue and client. I tried Group By, but is only works with integer values. Any idea, how I can do this? Pandas based solution should also be fine.
I tried .groupby
, in pandas too, command and output below, it only gives me a subset of needed dataframe.
Is it possible to update the label
column against each Issue
in first dataframe and removed duplicates and get output as expected below?
SQL Version is:
Microsoft SQL Server 2014
Output:
Issue | Subject | type | Team | Sub Team | Client | Priority | CreatedOn | Label | BuiltOn | CreatedBy | Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | ABCABC | Bug | Develop | Automation | Andy | 0 | 2021-01-11 00:00:00 | Enhancement | None | John | InProgress |
1 | 2 | DEFDEF | Bug | Develop | Automation | Judy | 0 | 2021-01-10 00:00:00 | Feature | None | Andre | New |
2 | 3 | HIGHIG | Bug | Develop | Testing123 | Cathy | 2 | 2021-02-11 00:00:00 | Feature | None | Keith | New |
3 | 3 | HIGHIG | Bug | Develop | Testing123 | Cathy | 2 | 2021-02-11 00:00:00 | Internal | None | Keith | New |
4 | 4 | XYZXYZ | Bug | Develop | Automation | Jack | 1 | 2021-05-11 00:00:00 | Enhancement | None | Maya | Analysis |
5 | 4 | XYZXYZ | Bug | Develop | Automation | Jack | 1 | 2021-05-11 00:00:00 | Internal | None | Maya | Analysis |
6 | 4 | XYZXYZ | Bug | Develop | Automation | Larry | 1 | 2021-05-11 00:00:00 | Enhancement | None | Maya | Analysis |
7 | 4 | XYZXYZ | Bug | Develop | Automation | Larry | 1 | 2021-05-11 00:00:00 | Internal | None | Maya | Analysis |
8 | 4 | XYZXYZ | Bug | Develop | Automation | Colin | 1 | 2021-05-11 00:00:00 | Enhancement | None | Maya | Analysis |
9 | 4 | XYZXYZ | Bug | Develop | Automation | Colin | 1 | 2021-05-11 00:00:00 | Internal | None | Maya | Analysis |
10 | 4 | XYZXYZ | Bug | Develop | Automation | Nitin | 1 | 2021-05-11 00:00:00 | Enhancement | None | Maya | Analysis |
11 | 4 | XYZXYZ | Bug | Develop | Automation | Nitin | 1 | 2021-05-11 00:00:00 | Internal | None | Maya | Analysis |
12 | 4 | XYZXYZ | Bug | Develop | Automation | Lisa | 1 | 2021-05-11 00:00:00 | Enhancement | None | Maya | Analysis |
13 | 4 | XYZXYZ | Bug | Develop | Automation | Lisa | 1 | 2021-05-11 00:00:00 | Internal | None | Maya | Analysis |
Expected (Note the label column):
Issue | Subject | Issue_type | Team | Sub Team | Client | Priority | CreatedOn | Label | BuiltOn | CreatedBy | Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | ABC | Bug | Develop | Automation | Andy | 0 | 2021-01-11 00:00:00 | Enhancement | None | John | InProgress |
1 | 2 | DEF | Bug | Develop | Automation | Judy | 0 | 2021-01-10 00:00:00 | Feature | None | Andre | New |
2 | 3 | HIG | Bug | Develop | Testing | Cathy | 2 | 2021-02-11 00:00:00 | Feature, Internal | None | Keith | New |
3 | 4 | XYZ | Bug | Develop | Automation | Jack | 1 | 2021-05-11 00:00:00 | Enhancement, Internal | None | Maya | Analysis |
4 | 4 | XYZ | Bug | Develop | Automation | Larry | 1 | 2021-05-11 00:00:00 | Enhancement, Internal | None | Maya | Analysis |
5 | 4 | XYZ | Bug | Develop | Automation | Colin | 1 | 2021-05-11 00:00:00 | Enhancement, Internal | None | Maya | Analysis |
6 | 4 | XYZ | Bug | Develop | Automation | Nitin | 1 | 2021-05-11 00:00:00 | Enhancement, Internal | None | Maya | Analysis |
7 | 4 | XYZ | Bug | Develop | Automation | Lisa | 1 | 2021-05-11 00:00:00 | Enhancement, Internal | None | Maya | Analysis |
Update: This is they query:
SELECT I.Issue,
I.Subject,
I.type,
P.Team,
P.Subteam,
CR.Client,
I.Priority,
I.CreatedOn,
L.Label,
I.BuiltOn,
I.CreatedBy,
I.Status
FROM master.IssueRequests AS I
JOIN master.Participants AS P
ON P.Issue = I.Issue
JOIN master.ClientRecords AS CR
ON CR.Issue = I.Issue
JOIN master.IssueLabels AS L
ON L.Issue = I.Issue
WHERE I.Issue IN ('2652523', '2703670', '2984120')
Update2
Output of df.groupby
:
df.groupby(['Issue', 'Client'])['Label'].apply(','.join).reset_index()
Output:
Issue | Client | Label | |
---|---|---|---|
0 | 1 | Andy | Enhancement |
1 | 2 | Judy | Feature |
2 | 3 | Cathy | Feature,Internal |
3 | 4 | Colin | Enhancement,Internal |
4 | 4 | Jack | Enhancement,Internal |
5 | 4 | Larry | Enhancement,Internal |
6 | 4 | Lisa | Enhancement,Internal |
7 | 4 | Nitin | Enhancement,Internal |
Clarification: Merging on all columns except Label
will not work since in some cases, some of the other data might be "null" or different, which can cause data to be missed altogether. If the data is different in other columns, I can keep the first instance of that data.