So I currently have a dataframe that looks like the following:
+-------------+----------------+---------------+------------------+-----------------+
| customer_id | init_base_date | init_end_date | reinit_base_date | reinit_end_date |
+-------------+----------------+---------------+------------------+-----------------+
| ... | | | | |
| A | 2015-07-30 | | | |
| A | | 2016-07-24 | | |
| B | 2015-07-10 | | | |
| B | | 2015-10-05 | | |
| B | | | 2016-01-09 | |
| B | | | | 2016-07-04 |
| C | 2015-05-13 | | | |
| C | | 2015-08-09 | | |
| ... | | | | |
+-------------+----------------+---------------+------------------+-----------------+
and I really need to convert it to the form:
+-------------+----------------+---------------+------------------+-----------------+
| customer_id | init_base_date | init_end_date | reinit_base_date | reinit_end_date |
+-------------+----------------+---------------+------------------+-----------------+
| ... | | | | |
| A | 2015-07-30 | 2016-07-24 | | |
| B | 2015-07-10 | 2015-10-05 | 2016-01-09 | 2016-07-04 |
| C | 2015-05-13 | 2015-08-09 | | |
| ... | | | | |
+-------------+----------------+---------------+------------------+-----------------+
I can think of a couple really tedious ways to do the above, but I was wondering if there was a quick and efficient method (maybe using windows? I've only been using PySpark for a month now, so definitely still a novice).