Converting duplicate date rows into columns rows?

Question

I am trying to convert rows (which have one single key but is duplicated due to multiple changes in the change period start and change period end date). I thought converting them into rows will remove duplicate values. I tried pivoting in Python but it did not work since the values will be date column and it cannot perform any operation on that.

Here is what I have:

Here is what I am trying to achieve

PS - I have million records with multiple orders. I would need help with a solution which can kind of automate it.

Help us help you. Provide an example that we can copy and paste (make it reproducible!) — Jim O., Jun 10 '19 at 02:07

Jim O. · Answer 1 · 2019-06-10T02:48:50.933

1

Python solution:

import pandas as pd
df = pd.DataFrame({"Change Period Start":["2/2/2019", "2/2/2019", "2/2/2019", "9/11/2019"], 
                   "Change Period End":["9/11/2019", "9/11/2019", "5/5/2019", "9/11/2019"], 
                   "Change Period Supplier":["1/1/2020", "1/1/2020", "1/1/2025", "9/11/2019"]})

df.drop_duplicates(subset=['Change Period Supplier'])

Change Period Start Change Period End   Change Period Supplier
            2/2/2019        9/11/2019                 1/1/2020
            2/2/2019         5/5/2019                 1/1/2025
           9/11/2019        9/11/2019                9/11/2019

R solution:

Change.Period.Start <- c("2/2/2019", "2/2/2019", "2/2/2019", "9/11/2019")
Change.Period.End <- c("9/11/2019", "9/11/2019", "5/5/2019", "9/11/2019")
Change.Period.Supplier <- c("1/1/2020", "1/1/2020", "1/1/2025", "9/11/2019")
df = data.frame(Change.Period.Start, Change.Period.End, Change.Period.Supplier)

df[!duplicated(df$Change.Period.Supplier), ]

  Change.Period.Start Change.Period.End Change.Period.Supplier
1            2/2/2019         9/11/2019               1/1/2020
3            2/2/2019          5/5/2019               1/1/2025
4           9/11/2019         9/11/2019              9/11/2019

Updated R version according to comments by OP

GR.Key <- c("A", "A", "A", "B")
Change.Period.Start <- c("2/2/2019", "2/2/2019", "2/2/2019", "9/11/2019")
Change.Period.End <- c("9/11/2019", "9/11/2019", "5/5/2019", "9/11/2019")
Change.Period.Supplier <- c("1/1/2020", "1/1/2020", "1/1/2025", "9/11/2019")
df = data.frame(GR.Key, Change.Period.Start, Change.Period.End, Change.Period.Supplier)

library(data.table)
dcast(df, GR.Key ~ paste0("Change.Period.Start", rowid(GR.Key)), value.var = "Change.Period.Start")

  GR.Key Change.Period.Start1 Change.Period.Start2 Change.Period.Start3
1      A             2/2/2019             2/2/2019             2/2/2019
2      B            9/11/2019                 <NA>                 <NA>

edited Jun 10 '19 at 02:48

answered Jun 10 '19 at 02:25

Jim O.

1,091
12
31

Hi, I really appreciate your answer. However, I have around million values. This will not work for that. Can you please give me a different solution? – VG95 Jun 10 '19 at 02:37
Could you elaborate on the problem? – Jim O. Jun 10 '19 at 02:37
Yes, for every GR Key there are like 50-60 changes. Therefore, the whole dataset may have unique 500-600 GR Keys but due to duplicate values(because of multiple change order dates), i need to bring all those rows into columns. – VG95 Jun 10 '19 at 02:38
Are you saying that you want to see only the duplicate values? – Jim O. Jun 10 '19 at 02:39
I am saying that for every GR_Key, i want to display all change dates in a single row. – VG95 Jun 10 '19 at 02:40
By the way, your questions should look like my answer: reproducible data with the code you've tried to achieve your solution. – Jim O. Jun 10 '19 at 02:51
Hi, Thanks for writing the update code. However, I am looking out at something like this (https://stackoverflow.com/questions/37653779/python-pandas-pivot-from-long-to-wide). Unfortunately, I am new and do not know how to post a table here! – VG95 Jun 10 '19 at 02:54
I understand. People in Stacks helped me when I was new here. I want to give back by doing the same thing, now that I have become more proficient at programming. I wish to help you, but I don't think I am understanding the question. I updated my code to make my solution look like the picture you posted. – Jim O. Jun 10 '19 at 02:58

Converting duplicate date rows into columns rows?

1 Answers1