0

I have a pandas dataframe looking like this:

           variable                 value
0          company_id                  
1           company_name           Rechner
2    former_company_name                  
3         company_street           Teststraße 
4      company_street_nr                20

5           company_name           Rechner2
6    former_company_name                  
7         company_street           Ringstraße 
8      company_street_nr                20

..                   ...               ...


[119 rows x 2 columns]

The dataframe looks like this because it is extracted from an excel file which other people have to fill for me. I need to get all the values in my database. In order to do so I want to generate a csv file. To do this I would like to merge the rows with the same variable name and write a new column with value2. Like this:

  variable                        value                value2
0          company_id                  
1           company_name           Rechner             Rechner2
2    former_company_name                  
3         company_street           Teststraße          Ringstraße 
4      company_street_nr                20              20

At the moment I'm doing this with extracting the rows by their indexes and using a join with the single dataframes. I would like to know if there is a better way to do this.

Ponyo1402
  • 11
  • 3
  • `df.assign(id=df.groupby('variable').cumcount().add(1)).pivot(index='variable', columns='id', values='value').add_prefix('value_')` – mozway Oct 20 '22 at 10:11

1 Answers1

0

First group values as array:

df2 = df.groupby("variable").agg(list)

>>                                           value
>> variable                                       
>> company_id                                [nan]
>> company_name                [Rechner, Rechner2]
>> company_street       [Teststraße , Ringstraße ]
>> company_street_nr                      [20, 20]
>> former_company_name                  [nan, nan]

Then expand the arrays column-wise:

max_len = max(map(len, df2["value"]))
columns = [f"value{i}" for i in range(1,max_len+1)]

pd.DataFrame(df2["value"].to_list(), columns=columns, index=df2.index).reset_index()

>>               variable       value1       value2
>> 0           company_id          NaN         None
>> 1         company_name      Rechner     Rechner2
>> 2       company_street  Teststraße   Ringstraße 
>> 3    company_street_nr           20           20
>> 4  former_company_name          NaN          NaN
Azhar Khan
  • 3,829
  • 11
  • 26
  • 32