0

So I have a few csv files in in the following format:

person,age,nationality,language
Jack,18,Canadian,English
Rahul,25,Indian,Hindi
Mark,50,American,English
Kyou, 21, Japanese, English

I need to import that, and return that data as a dictionary, with the keys as the column headings in the first row, and all the data in each column as values for that specific key. For example:

dict = {
    'person': ['Jack', 'Rahul', 'Mark', 'Kyou'],
    'age': [18, 25, 50, 21],
    'nationality': ['Canadian', 'Indian', 'American', 'Japanese'],
    'language': ['English', 'Hindi', 'English', 'English']
}

Any idea how I would begin this code and make it so that the code would work for any number of columns given in a .csv file?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user3033494
  • 159
  • 1
  • 2
  • 6
  • Have a look at the `csv` module. Particularly `DictReader`. It should allow you to get `[{'person': 'jack', 'age': '18', ...}, ...]`. From there it's a simple transform to get what you want. – mgilson Nov 25 '13 at 18:29
  • The `csv` module lets you convert it to a `list` of `dicts`. – RyPeck Nov 25 '13 at 18:30
  • I need to output the result as a dictionary though, not a list. – user3033494 Nov 25 '13 at 18:33
  • Not quite a duplicate - related, but that question wants to use a column value as keys, where this question wants to use headers as keys. – Peter DeGlopper Nov 25 '13 at 18:54
  • Not a duplicate exactly, I needed to use the first row as the key, as opposed to the 1st column in that question – user3033494 Nov 25 '13 at 19:33

4 Answers4

3

I'd go for something like:

import csv

with open('input') as fin:
    csvin = csv.reader(fin)
    header = next(csvin, [])
    print dict(zip(header, zip(*csvin)))

# {'person': ('Jack', 'Rahul', 'Mark', 'Kyou'), 'age': ('18', '25', '50', ' 21'), 'language': ('English', 'Hindi', 'English', ' English'), 'nationality': ('Canadian', 'Indian', 'American', ' Japanese')}

Adapt accordingly.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • This is nice and concise. One thing to watch out for is that this will break on ragged rows - that is, if any omit `language` and the trailing comma after `nationality`. `reader` instances don't fill in missing fields, and `zip` truncates to the shortest sequence. Whether or not that's a real concern depends on the situation. – Peter DeGlopper Nov 25 '13 at 18:46
2

Using the csv module, I would do it this way:

with open('somefile.csv', 'rb') as input_file:
    reader = csv.DictReader(input_file)
    results = {}
    for linedict in reader:
        for (key, value) in linedict.iteritems():
            results.setdefault(key, []).append(value)
Peter DeGlopper
  • 36,326
  • 7
  • 90
  • 83
1

Here is a fairly straightforward solution that uses the python CSV module (DOCs here: http://docs.python.org/2/library/csv.html). Just replace 'csv_data.csv' with the name of you CSV file.

import csv

with open('csv_data.csv') as csv_data:
    reader = csv.reader(csv_data)

    # eliminate blank rows if they exist
    rows = [row for row in reader if row]
    headings = rows[0] # get headings

    person_info = {}
    for row in rows[1:]:
        # append the dataitem to the end of the dictionary entry
        # set the default value of [] if this key has not been seen
        for col_header, data_column in zip(headings, row):
            person_info.setdefault(col_header, []).append(data_column)

    print person_info
saxman01
  • 288
  • 2
  • 9
0

You could use zipping combined with slicing in a dict comprehension, once you've gotten the data in to a list of lists with the csv module.

{col[0] : col[1:] for col in zip(*rows)}
Silas Ray
  • 25,682
  • 5
  • 48
  • 63