0

I am trying to convert some data into a more useful format from .xls to .csv with pandas.

I have loaded the data like so:

xls = pd.ExcelFile('file.xls')

which returns a data frame which looks like:

Name    Event1    Date1    Event2    Date2    Event3    Date3
Joe     jump      1.1.13   skip      1.2.13   hop       1.3.14
Jack    skip      1.2.12   run       1.5.14   NA        NA

I would like to reformat the data so names are repeated multiple times for each event they participated in with their date. i.e.

Name    Event   Date
Joe     jump    1.1.13
Joe     skip    1.2.13

In a way I can remove all NA. I have multiple sheets in xcel which is why i want to concatenate everything in this way.

Is there a simple command or am I stuck with for loops?

thanks!

JP1
  • 731
  • 1
  • 10
  • 27

1 Answers1

3

Use pd.lreshape which is the go to method for converting a wide formatted DF to a long one such as this.

This method accepts a dictionary as it's groups parameter wherein the column names starting with a certain prefix are clustered under a single wholesome column.

d = dict(Event=df.filter(regex="^Event").columns, Date=df.filter(regex="^Date").columns)
pd.lreshape(df, d)

enter image description here

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • Nice one. You could also use `dict(Event=df.columns[df.columns.str.startswith('Event')], Date=df.columns[df.columns.str.startswith('Date')])`. – John Zwinck Mar 24 '17 at 15:15
  • I know, but I wanted to keep things concise ;-) – Nickil Maveli Mar 24 '17 at 15:17
  • I could not find the documentation other than in the code on github:https://github.com/pandas-dev/pandas/blob/master/pandas/core/reshape.py#L857 ; is this function too new ? – Moritz Mar 24 '17 at 15:35
  • Yeah, that's because it isn't a publicly documented method and still very much in it's experimental stages. There have been [speculations](https://github.com/pandas-dev/pandas/issues/15003) going on whether to use `pd.lreshape` or it's sister method `pd.wide_to_long`. Again, if the grouping variables aren't of the same length, then there are chances of it giving out erroneous results. – Nickil Maveli Mar 24 '17 at 15:39