0

I have a dataframe that has two date containing columns I'd like to perform the following operations on:

  1. Concatenate into a NEW column.
  2. Get the unique values (no redundant dates).
data = [
    [
        "2018-12-08",
        "2018-12-09",
    ],
    ["2020-12-19", "2020-12-20"],
    ["2020-12-19", "2020-12-19"],
    ["2020-10-06", "2020-10-12"],
]

df = pd.DataFrame(data, columns=["date1", "date2"])
df[["date1", "date2"]].apply(lambda x: pd.to_datetime(x).dt.date)

The challenge I am facing is finding a clear + concise way to do this.

To create one column I am doing this:

df['date_range'] = df[['date1', 'date2']].astype(str).values.tolist()

but I am not sure how to get the unique values from each row?

My desired dataframe looks like this:

date1       date2            date_range
2018-12-08  2018-12-09  2018-12-08,2018-12-09
2018-12-19  2018-12-20  2018-12-19,2018-12-20
2018-12-19  2018-12-19  2018-12-19
2018-10-06  2018-10-12  2018-10-06,2018-10-12
user
  • 651
  • 10
  • 22

2 Answers2

2

This should work. Usings.unique() preserves the order of the series, so if the order of the start and end dates is important, than this would be a solution.

df['date_range'] = df[['date1', 'date2']].apply(lambda x: x.unique(),axis=1).str.join(', ')
rhug123
  • 7,893
  • 1
  • 9
  • 24
1

You can use set() to find the unique elements in each row, and a list comprehension to generate your desired result, joining each unique list together with a comma. Something like

df['date_range'] = [','.join(list(set(dates))) for dates in df[['date1', 'date2']].astype(str).values]

Also, you can drop the .tolist()!

  • 1
    Note that `set()` does *not* preserve order. E.g. `','.join(list(set(["2018-12-08","2018-12-09"])))` may spill out `'2018-12-08,2018-12-09'`, but `','.join(list(set(["2018-12-09","2018-12-10"])))` produces `'2020-12-10,2020-12-09'`. (There is a [logic](https://stackoverflow.com/questions/15479928/why-is-the-order-in-dictionaries-and-sets-arbitrary) to this.) So, you cannot rely on the 2nd date in your concatenation to represent the value from the 2nd col. If `date1` <= `date2` is always true, you could use `sorted()` instead of `list()`. Otherwise, the answer by @rhug123 is one way to go. – ouroboros1 Jul 13 '22 at 07:20