6

I am using Python's csv.DictReader to read in values from a CSV file to create a dictionary where keys are first row or headers in the CSV and other rows are values. It works perfectly as expected and I am able to get a dictionary, but I only want certain keys to be in the dictionary rather than all of the column values. What is the best way to do this? I tried using csv.reader but I don't think it has this functionality. Maybe this can be achieved using pandas?

Here is the code I was using with CSV module where Fieldnames was the keys that I wanted to retain in my dict. I realized it isn't used for what I described above.

import csv
with open(target_path+target_file) as csvfile:
    reader = csv.DictReader(csvfile,fieldnames=Fieldnames)
    for i in reader:
        print i
Garrett Kadillak
  • 1,026
  • 9
  • 18
anekix
  • 2,393
  • 2
  • 30
  • 57

2 Answers2

13

You can do this very simply using pandas.

import pandas as pd

# get only the columns you want from the csv file
df = pd.read_csv(target_path + target_file, usecols=['Column Name1', 'Column Name2'])
result = df.to_dict(orient='records')

Sources:

Onel Harrison
  • 1,244
  • 12
  • 14
  • 1
    thanks for reply but is the end result a list of dicts? i wanted to convert each row into a dict with with some keys. let me try this . thanks :) – anekix Jun 04 '17 at 18:31
  • 1
    It didn't before, but the updated answer should work for you once you change the column names to be what you have in your data. – Onel Harrison Jun 04 '17 at 18:42
  • can i also rename some of the existing key names like i provide a mapping between old keys and new keys? – anekix Jun 04 '17 at 19:04
  • 1
    Here's a SOA for renaming existing key names https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas#11354850 – Onel Harrison Jun 04 '17 at 23:48
  • thanks again just one last thing there are some columns with empty values in csv file so in dictionary it become `nan` .is there any way to make hem default to `0` for empty cells in csv. should i ask a seperate question for it? – anekix Jun 05 '17 at 02:31
  • i treis .fillna(0) but `nan` is replaced with `0.0` instead of `0` – anekix Jun 05 '17 at 02:43
  • .also i noticed some field are being converted to `floats` by adding `.0` to them. is it a know behaviour?? – anekix Jun 05 '17 at 02:48
  • @anekix your follow up questions deviate from main question. I'd recommend investing sometime in thoroughly learning pandas as it solves many of the problems you are asking and creating a new question if you run into more problems. – Onel Harrison Jun 10 '17 at 21:50
5

You can use the to_dict method to get a list of dicts:

import pandas as pd

df = pd.read_csv(target_path+target_file, names=Fieldnames)

records = df.to_dict(orient='records')

for row in records:
    print row

to_dict documentation:

In [67]: df.to_dict?
Signature: df.to_dict(orient='dict')
Docstring:
Convert DataFrame to dictionary.

Parameters
----------
orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
    Determines the type of the values of the dictionary.

    - dict (default) : dict like {column -> {index -> value}}
    - list : dict like {column -> [values]}
    - series : dict like {column -> Series(values)}
    - split : dict like
      {index -> [index], columns -> [columns], data -> [values]}
    - records : list like
      [{column -> value}, ... , {column -> value}]
    - index : dict like {index -> {column -> value}}

      .. versionadded:: 0.17.0

    Abbreviations are allowed. `s` indicates `series` and `sp`
    indicates `split`.

Returns
-------
result : dict like {column -> {index -> value}}
File:      /usr/local/lib/python2.7/dist-packages/pandas/core/frame.py
Type:      instancemethod
sirfz
  • 4,097
  • 23
  • 37
  • thanks a lot for your reply wish i could accept more than one answer.but can you please tell if there is any way that i can rename some existing keys by providing a mapping or something? – anekix Jun 04 '17 at 19:05
  • The `names` attribute allows you to provide your own column names. If your file contains a header, just include `header=0` (i.e. 1st line) to make sure you replace the header with your own mappings in `names`. In addition, if you wish to only keep specific columns, you can specify them in the `usecols` parameter. – sirfz Jun 04 '17 at 19:09
  • i want to keep specefic columns but after that rename those columns – anekix Jun 04 '17 at 19:13
  • That's exactly what the `names` parameter allows you to do. Refer to `pandas.read_csv` documentation for details and play around with it to get a better grasp of its features. – sirfz Jun 04 '17 at 19:14
  • looking into it – anekix Jun 04 '17 at 19:15