1

I'm working on a problem involving Pandas in Python 3.4. I'm stuck at one small subsection which involves re-organizing my data frames. I shall be more specific.

I have a table called "model" in the format of:

Model Input

I wish to get the desired output in the form equivalent to:

I wish to get the output similar to:

Desired Output

I have looked into Convert a python dataframe with multiple rows into one row using python pandas? and How to combine multiple rows into a single row with pandas. I am getting confused on whether I should use groupby, or pivot table. I tried using both but I either get a KeyError or not the right format I wanted. Is there any specific library that can help achieve the above task?

braaterAfrikaaner
  • 1,072
  • 10
  • 20
  • 1
    Please read up on [how to write a good pandas question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples); images are not very useful. – DSM Feb 27 '18 at 01:34
  • I apologise. Thank you for the resource. – Vinay Ashokkumar Feb 27 '18 at 01:53

1 Answers1

0

You can use groupby and apply:

num_V = 5
max_row = df.groupby('ID').ID.count().max()
df2= (
        df.groupby('ID')
        .apply(lambda x: x.values[:,1:].reshape(1,-1)[0])
        .apply(pd.Series)
        .fillna(0)
)

df2.columns = ['V{}_{}_{}'.format(i+1,j,i) for j in range(max_row) for i in range(num_V)]
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • I typed the code exactly as you described, and I get a syntax error at the f'V{I+1}_{j}_{I}' line. I'm running on Python 3.4 on a Linux terminal. – Vinay Ashokkumar Feb 27 '18 at 14:18
  • @VinayAshokkumar , that's because your python version is lower than 3.6 and does not support that syntax. Please try again with the updated answer. – Allen Qin Feb 27 '18 at 17:24
  • the syntax is accepted. But I get a new TypeError: set_axis() got multiple values for argument 'axis'. I tried running the program without the set_axis() command the tables are reformated structurally how I wanted. Thank you for that. Is there a way to combat this error? – Vinay Ashokkumar Feb 27 '18 at 18:22
  • Again that's because the version incompatibility. I don't have a lower version installed and please try now. – Allen Qin Feb 27 '18 at 18:26
  • I'm getting a new error called length mismatch: Expected axis has 30 elements, new values have 24 elements. But I think I know why I'm getting that error. When I output the table earlier (without set_axis()), I noticed that after four columns the ID column (holding values 1 and 2) get repeated. This happens after every four columns and that's why the result is 30 when it should be 24. (ID is printed unnecessarily 6 times). Is there a way to get rid of the duplicate ID columns and and show only the numerical results? Thank you for your efforts so far. Really appreciate the help! – Vinay Ashokkumar Feb 27 '18 at 18:47
  • Your description doesn't match your screenshot. Until you provide the dataset sample, I can't help u any further. – Allen Qin Feb 27 '18 at 21:20
  • sincere apologies. I got excited with the debugging and incorrectly gave you a description of the actual problem I was working on. I did rectify that error. I had an extra numbered column that I should have dropped but didn't. The lengths match. And the code runs. I up-voted for your answer. Thank you very much for your answer and patience! Really appreciate it :) – Vinay Ashokkumar Feb 28 '18 at 00:44
  • No worries. Glad it helped. – Allen Qin Feb 28 '18 at 01:43