Pivot a CSV string using python without using pandas or any similar library

Question

You may think of this one as another redundant question asked, but I tried to go through all similar questions asked, no luck so far. In my specific use-case, I can't use pandas or any other similar library for this operation.

This is what my input looks like

AttributeName,Value
Name,John
Gender,M
PlaceofBirth,Texas
Name,Alexa
Gender,F
SurName,Garden

This is my expected output

Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,

So far, I have tried to store my input into a dictionary and then tried writing it to a csv string. But, it is failing as I am not sure how to incorporate missing column values conditions. Here is my code so far

  reader = csv.reader(csvstring.split('\n'), delimiter=',')

  csvdata = {}
  csvfile = ''
  for row in reader:
    if row[0] != '' and row[0] in csvdata and row[1] != '':
      csvdata[row[0]].append(row[1])
    elif row[0] != '' and row[0] in csvdata and row[1] == '':
      csvdata[row[0]].append(' ')
    elif row[0] != '' and row[1] != '':
      csvdata[row[0]] = [row[1]]
    elif row[0] != '' and row[1] == '':
      csvdata[row[0]] = [' ']
    
  for key, value in csvdata.items():
    if value == ' ':
      csvdata[key] = []

  csvfile += ','.join(csvdata.keys()) + '\n'
  for row in zip(*csvdata.values()):
    csvfile += ','.join(row) + '\n'

For the above code as well, I took some help here. Thanks in advance for any suggestions/advice.

Edit #1 : Update code to imply that I am doing processing on a csv string instead of a csv file.

is 'Name' mandatory? – Lante Dellarovere Apr 29 '19 at 12:05 — Lante Dellarovere, Apr 29 '19 at 12:05
No value in column AttributeName column is mandatory. – Saurabh Saxena Apr 29 '19 at 12:30 — Saurabh Saxena, Apr 29 '19 at 12:30

score 1 · Answer 1 · answered Apr 29 '19 at 11:55

1

What you need is something like that:

import csv

with open("in.csv") as infile:
    buffer = []
    item = {}

    lines = csv.reader(infile)
    for line in lines:
        if line[0] == 'Name':
            buffer.append(item.copy())
            item = {'Name':line[1]}
        else:
            item[line[0]] = line[1]
    buffer.append(item.copy())

for item in buffer[1:]:
    print item

answered Apr 29 '19 at 11:55

framontb

1,817
1
15
33

with the buffer[1:] list, you can save it in a csv or whatever – framontb Apr 29 '19 at 11:58

Lante Dellarovere · Answer 2 · 2019-04-29T14:59:33.903

If none of the attributes is mandatory, I think @framontb solution needs to be rearranged in order to work also when Name field is not given.
This is an import-free solution, and it's not super elegant.

I assume you have lines already in this form, with this columns:

lines = [
    "Name,John",
    "Gender,M",
    "PlaceofBirth,Texas",
    "Gender,F",
    "Name,Alexa",
    "Surname,Garden"  # modified typo here: SurName -> Surname
]

cols = ["Name", "Gender", "Surname", "PlaceofBirth"]

We need to distinguish one record from another, and without mandatory fields the best I can do is start considering a new record when an attribute has already been seen.
To do this, I use a temporary list of attributes tempcols from which I remove elements until an error is raised, i.e. new record.

Code:

csvdata = {k:[] for k in cols}

tempcols = list(cols)
for line in lines:
    attr, value = line.split(",")
    try:
        csvdata[attr].append(value)
        tempcols.remove(attr)
    except ValueError:
        for c in tempcols:  # now tempcols has only "missing" attributes 
            csvdata[c].append("")
        tempcols = [c for c in cols if c != attr]
for c in tempcols:
    csvdata[c].append("")

# write csv string with the code you provided
csvfile = ""
csvfile += ",".join(csvdata.keys()) + "\n"
for row in zip(*csvdata.values()):
    csvfile += ",".join(row) + "\n"

>>> print(csvfile)
Name,PlaceofBirth,Surname,Gender
John,Texas,,M
Alexa,,Garden,F

While, if you want to sort columns according to your desired output:

csvfile = ""
csvfile += ",".join(cols) + "\n"
for row in zip(*[csvdata[k] for k in cols]):
    csvfile += ",".join(row) + "\n"

>>> print(csvfile)
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,

brunns · Answer 3 · 2019-04-29T12:33:21.950

0

This works for me:

with open("in.csv") as infile, open("out.csv", "w") as outfile:
    incsv, outcsv = csv.reader(infile), csv.writer(outfile)
    incsv.__next__()  # Skip 1st row
    outcsv.writerows(zip(*incsv))

Update: For input and output as strings:

import csv, io

with io.StringIO(indata) as infile, io.StringIO() as outfile:
    incsv, outcsv = csv.reader(infile), csv.writer(outfile)
    incsv.__next__()  # Skip 1st row
    outcsv.writerows(zip(*incsv))

    print(outfile.getvalue())

edited Apr 29 '19 at 12:33

answered Apr 29 '19 at 11:29

brunns

2,689
1
13
24

I need to create a csvstring in place of csv file. – Saurabh Saxena Apr 29 '19 at 11:52
The input csv which I am getting is also in form of csv string and I am not in liberty to use with (open) :( can we do it just by string manipulations. – Saurabh Saxena Apr 29 '19 at 12:28
We could use StringIO for the input data too. We *could* do it all ourselves with string functions and a state machine, but it's really awkward. CSVs are harder to parse correctly than you might think. Better to use the`csv` module. That's what it's there for. – brunns Apr 29 '19 at 12:31

Pivot a CSV string using python without using pandas or any similar library

3 Answers3

Linked