1

I need an application to work on CSV files where I need to split one single row into multiple rows based on multiple value in any particular column. It would be better if anyone can help me with python code but any other solution is welcomed. Please help me to solve this. Thanks

Here is the input file: [![enter image description here][1]][1]enter image description here`

And I want to transfer this file into like below: enter code here

enter image description here

If I open the csv file in Notepad++, I can see these as below:
`Name,ID Club,Position,Inactive,Attendance Date

Arnold Simon,xxxxx,Volleyball - Women's,Player,No,"04/07/2021 04/05/2021"

Arnold Simon,xxxxx,Volleyball - Women's,Safety Officer,No,

Mike Anderson,yyyyy,Volleyball - Women's,Player,No,"04/07/2021 04/05/2021"

Thomas Engle,AAAAA,Volleyball - Women's,Player,No,4/5/2021

Chris Haris,BBBBB,Baseball,Player,No,"07/12/2021 07/11/2021 06/29/2021 06/25/2021 06/16/2021"`

Sajjad
  • 57
  • 1
  • 7
  • Is the CSV well-formed regarding multiline fields? I.e. is the value wrapped in quotes or some sort of open/close marker? – mojo Aug 02 '21 at 14:06
  • Depending on the answer to that question, [csv.reader](https://docs.python.org/3/library/csv.html#csv.reader) takes an optional `newline` parameter that you can use to allow newlines embedded in a record/field. – mojo Aug 02 '21 at 14:09
  • Pandas and itertools give an example here: https://stackoverflow.com/questions/50731229/split-cell-into-multiple-rows-in-pandas-dataframe – SNygard Aug 02 '21 at 14:11
  • @mojo if I open the csv file into notepad++ I see the file as below: Name,ID Club,Position,Inactive,Attendance Date Arnold Simon,xxxxx,Volleyball - Women's,Player,No,"04/07/2021 04/05/2021" Arnold Simon,xxxxx,Volleyball - Women's,Safety Officer,No, Mike Anderson,yyyyy,Volleyball - Women's,Player,No,"04/07/2021 04/05/2021" Thomas Engle,AAAAA,Volleyball - Women's,Player,No,4/5/2021 Chris Haris,BBBBB,Baseball,Player,No,"07/12/2021 07/11/2021 06/29/2021 06/25/2021 06/16/2021" – Sajjad Aug 02 '21 at 14:24
  • @Sajjad, I can't see newlines or anything in the comment, but I could believe that there are newlines in the "Attendance Date" field data you posted. See what happens with csv.reader. – mojo Aug 02 '21 at 18:41

2 Answers2

1

I got it done with this below 1 line code!

`import pandas as pd

df = pd.read_csv('testfile.csv')

res = df.set_index(['Name', 'ID', 'Club','Position','Inactive']).apply(lambda x: x.str.split('\n').explode()).reset_index()

res.to_csv('output.csv')`

Sajjad
  • 57
  • 1
  • 7
1

You could do this short one:

import csv
with open('input_file.csv') as file:
    new = [row[:-1] + [i] for row in csv.reader(file) for i in row[-1].split()]
with open('output_file.csv', 'w') as file:
    csv.writer(file).writerows(new)

Or this equivalent longer one:

import csv
with open('input_file.csv') as f_in, open('output_file.csv', 'w') as f_out:
    csv_out = csv.writer(f_out)
    for row in csv.reader(f_in):
        for i in row[-1].split():
            csv_out.writerow(row[:-1] + [i])
Pi Marillion
  • 4,465
  • 1
  • 19
  • 20
  • just a little correction for the 2nd one: import csv with open('testfile.csv','r') as f_in, open('output_file2.csv','w',newline='') as f_out: csv_out = csv.writer(f_out) for row in csv.reader(f_in): for i in row[-1].split(): csv_out.writerow(row[:-1] + [i]) – Sajjad Aug 05 '21 at 18:30