0

I am working with a data set from a csv file.

I import the data into a dictionary with this dic = df.to_dict()

This works well but because of the way the data is structured I get a dictionary of dictionaries. The nested dictionary has multiple "nan" values. I need to remove all nan values and the dictionaries can remain nested or I could use a normal dictionary.

The data prints from the dictionary in this format:

{'1/13/2018': {0: 'Monday', 1: 'Red', 2: 'Violet', 3: 'Aqua', 4: 'Pink', 5: 'White', 6: nan, 7: nan, 8: nan},

Here is a sample of my code:

df = pd.read_csv(infile, parse_dates=True, infer_datetime_format=True)
dic = df.to_dict()

I have tried the advice here and attempted to do this with some comprehension but I think because of the nested nature I am not sure how to adapt it.

I have also tried looping in this way:

value_list = []
key_list = []

for k, v in dic.items():
    key_list.append(k)
    for c, q in v.items():
        if str(q) != 'nan':
            value_list.append(q)
        else:
            pass

I was hoping with this I could create a new dic from the two lists. However there is data blurs together and it becomes hard to separate value sets. There must be a better more pythonic way to do this.

Joe
  • 2,641
  • 5
  • 22
  • 43
  • How about removing the nan's before calling `df.to_dict`? Have a look at [`pandas.DataFrame.dropna`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html). – Graipher Feb 08 '18 at 16:15
  • @Graipher That looks like what I need but it is not working correctly. When I apply use it all nans are dropping but so are most of the values. There is now only 1 value is returning for each key – Joe Feb 08 '18 at 16:26
  • It is hard to say why it does not work without seeing your dataframe. Perhaps you are just missing the right `axis` argument. – Graipher Feb 08 '18 at 16:28
  • 1
    @Graipher I think the nested nature of the dics are screwing it up. I set index to 1 and it worked but only returned three of the 20+ keys. When I set it to 0 I get the previous behavior – Joe Feb 08 '18 at 16:35
  • I think the reason I was getting reduced results is because dropna is dropping entire columns or rows. I need just the nans in a column dropped not the entire column – Joe Feb 08 '18 at 16:48

1 Answers1

2

Recursion dear OP:

from math import isnan


def remove_nans(d):
    for key in d.copy():
        if type(d[key]) == float and isnan(d[key]):
            del d[key]
        elif type(d[key]) == dict:
            remove_nans(d[key])

Call remove_nans on your dict and it will do the work.

Abhishek Kumar
  • 461
  • 2
  • 11