1

I have a dataframe with 28 column, 27 of which are datetime columns, 1 of which is an ID column. I need to convert the dataset into 3 total columns where there's one datatime column, a second column with the original header from which the datetime derived, and a third with the original ID from which the datetime derived.

Example of non-transformed data:

ID Buy    Sell     Hold 
1  2/2/17 2/3/17   2/4/17
2  3/2/17 3/3/17   3/4/17
3  4/2/17 4/3/17   4/4/17

Example of transformed data:

Date   Activity ID
2/2/17 Buy      1
2/3/17 Sell     1
2/4/17 Hold     1
3/2/17 Buy      2
3/3/17 Sell     2
3/4/17 Hold     2
4/2/17 Buy      3 
4/3/17 Sell     3 
4/4/17 Hold     3 

My understanding is that this task can be achieved with pandas melt, however is it feasible with 27 datetime columns coupled with a single ID column?

How do I achieve this task with so many dimensions?

user3483203
  • 50,081
  • 9
  • 65
  • 94
Mr. Jibz
  • 511
  • 2
  • 7
  • 21
  • 1
    For the ordering you have in your desired output, seems like you want `df.set_index('ID').stack()`, but yes, `melt` will work with that many columns – user3483203 Jun 06 '19 at 15:36
  • good solution, but how would i ensure `ID` is populating each row, that is, along with each event and datetime input? from what I see it's grouping event and datetime by the `ID` once only per `ID`. – Mr. Jibz Jun 06 '19 at 15:39
  • That's just how the representation of a multi-index looks. If you want them as columns, I guess try `df.set_index('ID').stack().rename_axis(['ID', 'Activity']).rename('Date').reset_index()` – user3483203 Jun 06 '19 at 15:42

1 Answers1

2

Use melt:

df_melt=df.melt(id_vars='ID',value_vars=['Buy','Sell','Hold'],var_name='Activity',value_name='Date')

You can also hyphen your 27 columns with a '_c' for example an then create a list similar to

list=[c for c in df if c.endwith('_c')] and reference the list in the melt function

Nev1111
  • 1,039
  • 9
  • 13
  • If you only supply `id_vars`, the rest of the columns will implicitly be "melted". ["Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars."](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.melt.html) – user3483203 Jun 06 '19 at 15:58
  • how do i exclude any rows where the original column lacked a row value? ultimately this solution works, but i want to pull any column values which do not equal NaN. – Mr. Jibz Jun 06 '19 at 16:01