0

I've got a question.

imagine I have the following df:

**Document type** **Invoicenumber Invoicedate**  description quantity unit price line amount

Invoice             123            28-08-2020
0                   NaN            17-09-2020    test        1,5        5              20
0                   NaN            16-04-2020    test2       1,5        5              20

Invoice             456            02-03-2020
0                   NaN            NaN           test3       21         3              64
0                   0              NaN           test3       21         3              64            
0                   0              NaN           test3       21         3              64            

The rows where there is a 0 are belonging to the row above and are line items of the same docuemnt.

My goal is to transpose the line items so that these are on the same line for each invoice as such:

**Document type** **Invoicenumber Invoicedate**  description#1  description#2 quantity quantity#2 unit price   unit price #2 line amount line amount #2 
    
    Invoice             123            28-08-2020    test           test2          1,5      1,5     5            5              20         20

same for second invoice line How to achieve this?

Please help
as per request:

{'Document Type': {0: 'IngramMicro.AccountsPayable.Invoice',
  1: 0,
  2: 0,
  3: 'IngramMicro.AccountsPayable.Invoice',
  4: 0,
  5: 0,
  6: 0},
 'Factuurnummer': {0: '0.78861803',
  1: 'NaN',
  2: 'NaN',
  3: '202130534',
  4: 'NaN',
  5: 'NaN',
  6: 'NaN'},
 'Factuurdatum': {0: '2021-05-03',
  1: nan,
  2: nan,
  3: '2021-09-03',
  4: nan,
  5: nan,
  6: nan},
 'description': {0: nan,
  1: 'TM 300 incl onderstel 3058C003 84433210 4549292119381',
  2: 'ESP 5Y 36 inch 7950A539 00000000 4960999794266',
  3: nan,
  4: 'Basistarief A3 Office',
  5: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021',
  6: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021'},
 'quantity': {0: nan, 1: 1.0, 2: 1.0, 3: nan, 4: 1.0, 5: 1.0, 6: 2.0},
 'unit price': {0: nan,
  1: 1211.63,
  2: 742.79,
  3: nan,
  4: 260.0,
  5: 30.0,
  6: 30.0},
 'line amount': {0: nan, 1: 21.0, 2: 21.0, 3: nan, 4: 260.0, 5: 30.0, 6: 30.0}}
Max
  • 493
  • 2
  • 9
  • Struggling to import your sample data. Could you share the output of `df.to_dict()` so that your data can be easily imported with `pd.DataFrame.from_dict()` (see [this questions](https://stackoverflow.com/questions/47450931/print-pandas-data-frame-for-reproducible-example-equivalent-to-dput-in-r))? – gofvonx Mar 19 '21 at 08:49
  • The first and second invoice have different 0 rows, so they will have different columns after transposition. Do you want to keep them in the same dataframe? – Ynjxsjmh Mar 19 '21 at 09:29
  • @Ynjxsjmh yes they have different zero rows! So I would like them to be for instance in description #3, quantity #3 etc. – Max Mar 19 '21 at 10:38
  • @gofvonx I will update the question – Max Mar 19 '21 at 10:42
  • @gofvonx please let me know if it works because it does not work for me ! – Max Mar 19 '21 at 11:02

1 Answers1

1

A not clever but possible way is to generate a new dataframe with desired column by iterating over the rows of original dataframe.

import ast
import sys
import pandas as pd
from io import StringIO

TESTDATA = StringIO("""Document type;Invoicenumber;Invoicedate;description;quantity;unit price;line amount
Invoice;123;28-08-2020
0;NaN;17-09-2020;test;1,5;5;20
0;NaN;16-04-2020;test2;1,5;5;20
Invoice;456;02-03-2020
0;NaN;NaN;test3;21;3;64
0;0;NaN;test3;21;3;64
0;0;NaN;test3;21;3;64
    """)

df = pd.read_csv(TESTDATA, delimiter=";")

df_new = pd.DataFrame(columns=df.columns)

first_type_index = 0
type_count = 0

for index, row in df.iterrows():
    if row['Document type'] == '0':
        type_count += 1

        df_new.loc[first_type_index, f'description#{type_count}'] = row['description']
        df_new.loc[first_type_index, f'quantity#{type_count}'] = row['quantity']
        df_new.loc[first_type_index, f'unit price#{type_count}'] = row['unit price']
        df_new.loc[first_type_index, f'line amount#{type_count}'] = row['line amount']

        row['Document type']
        row['Invoicenumber']
    else:
        first_type_index = index
        type_count = 0
        df_new.loc[first_type_index] = row

df_new.drop(['description', 'quantity', 'unit price', 'line amount'], axis=1, inplace=True)
|   | Document type | Invoicenumber | Invoicedate | description#1 | quantity#1 | unit price#1 | line amount#1 | description#2 | quantity#2 | unit price#2 | line amount#2 | description#3 | quantity#3 | unit price#3 | line amount#3 |
| 0 | Invoice       |         123.0 |  28-08-2020 | test          | "1,5"      |          5.0 |          20.0 | test2         | "1,5"      |          5.0 |          20.0 |               |            |              |               |
| 3 | Invoice       |         456.0 |  02-03-2020 | test3         | 21         |          3.0 |          64.0 | test3         | 21         |          3.0 |          64.0 | test3         |         21 |          3.0 |          64.0 |

To resort the column names, you could use

column_to_swap = df_new.columns[3:]
len_swap = len(column_to_swap)

column_swap = [None] * len_swap

j = 0

# 4 is the count of duplicated columns
# description, quantity, unit price, line amount
for i in range(4):
    fast_i = i

    while fast_i < len_swap:
        column_swap[j] = column_to_swap[fast_i]
        fast_i += 4
        j += 1

df_new = df_new.reindex(columns=list(df_new.columns.values[0:3])+column_swap)
|   | Document type | Invoicenumber | Invoicedate | description#1 | description#2 | description#3 | quantity#1 | quantity#2 | quantity#3 | unit price#1 | unit price#2 | unit price#3 | line amount#1 | line amount#2 | line amount#3 |
| 0 | Invoice       |         123.0 |  28-08-2020 | test          | test2         |               | "1,5"      | "1,5"      |            |          5.0 |          5.0 |              |          20.0 |          20.0 |               |
| 3 | Invoice       |         456.0 |  02-03-2020 | test3         | test3         | test3         | 21         | 21         |         21 |          3.0 |          3.0 |          3.0 |          64.0 |          64.0 |          64.0 |
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52