0

Suppose I have the following csv data:

first_name,last_name
tom,hanks
tom,cruise

I would like to convert this data as follows:

data = {
    'first_name': ['tom','tom'],
    'last_name': ['hanks', 'cruise']
}

What would be the best way to do the above (not using a library such as pandas, numpy, or csv).

pault
  • 41,343
  • 15
  • 107
  • 149
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    Can you use the `csv` module? – pault Dec 17 '18 at 20:32
  • 2
    csv module dict reader + defaultdicts are the way to go. – alkasm Dec 17 '18 at 20:33
  • @AlexanderReynolds yea I'd agree but I'm looking to implement a pure python solution as the data may not be an actual valid csv. – David542 Dec 17 '18 at 20:34
  • 2
    I mean those are pure python, they're standard library. Still, if you don't want to use the `csv` module, you can parse each line with a `line.split(',')` to split on commas, and then append each value in the split list to the corresponding list in the dictionary. – alkasm Dec 17 '18 at 20:36
  • @timgeb no I'm just saying a solution without using the csv module. – David542 Dec 17 '18 at 20:38
  • Why, **why** can't you use the csv module? And what exactly is the issue you are encountering when you attempt to do this? – juanpa.arrivillaga Dec 17 '18 at 20:40

2 Answers2

6

Personally, I'd go with pandas or csv but this is fairly easy to implement without any imports:

header = None
data = {}
for line in myfile:
    lstrip = line.strip().split(",")
    if not header:
        header = lstrip
        data = {k: [] for k in header}
    else:
        for i, value in enumerate(lstrip):
            data[header[i]].append(value)

print(data)
#{'first_name': ['tom', 'tom'], 'last_name': ['hanks', 'cruise']}
pault
  • 41,343
  • 15
  • 107
  • 149
2

Faking your file:

>>> from io import StringIO                                                                                            
>>> file = StringIO('''first_name,last_name 
...: tom,hanks 
...: tom,cruise''')

Creating the dict:

>>> data = [(k, []) for k in next(file).strip().split(',')]                                                            
>>> for line in file: 
...:     for i, field in enumerate(line.strip().split(',')): 
...:         data[i][1].append(field) 
...:                                                                                                                   
>>> data = dict(data)                                                                                                  
>>> data                                                                                                               
{'first_name': ['tom', 'tom'], 'last_name': ['hanks', 'cruise']}

This is more of a programming exercise than a solution you should use in the real world. It's not robust at all and will fail for all kinds of common cases, such as having quoted fields containing commas in the csv file.


With csv, for other readers:

>>> import csv                                                                                                         
>>> reader = csv.reader(file) # assume fresh StringIO instance
>>> dict(zip(next(reader), zip(*reader)))                                                                              
{'first_name': ('tom', 'tom'), 'last_name': ('hanks', 'cruise')}

(Use dict(zip(next(reader), map(list, zip(*reader)))) if having lists as values is important.)

timgeb
  • 76,762
  • 20
  • 123
  • 145