Memory issue in python pandas

Question

I am very new to pandas. I am loading data from csv as read_csv to a dataframe.. My CSV file is 328KB with 535 rows and 7 columns, in that 7 columns 1 column is JSON which contains almost 25-30 keys.

My logic is looping that json column and finding the percentile value. But when i run this python program my system is totally get hanged.

    for rowcount in df.index:       
        jdata = json.loads(df[col_nam].ix[rowcount])        
        for rowindex in df.index:                       
            if (rowcount != rowindex):
                json_data = json.loads(df[col_nam].ix[rowindex])                
                for i in range(len(jdata.keys())):
                    count = 1
                    done = True                             
                    for j in range(len(json_data.keys())):
                        if(jdata.keys()[i] == json_data.keys()[j]):
                            if done:
                                count  = 0
                            done = False                    
                            if(json_data.values()[j] < jdata.values()[i]):                          
                                count = count + 1
                                break
                    result_data.append(pd.Series([str(jdata.keys()[i]),count,len(df.index),rowcount,df['D_code'].ix[rowcount],df['t_code'].ix[rowcount]],index = 

['key','lessthan_count','count','rowcount','D_code','t_code']))

MODIFIED CODE:

for rowcount in df.index:       
    jdata = json.loads(df[col_nam].ix[rowcount])        
    for rowindex in df.index:                       
        if (rowcount != rowindex):
            json_data = json.loads(df[col_nam].ix[rowindex])                
            #for i in range(len(jdata.keys())):
             for key in jdata
                count = 1
                done = True                             
                #for j in range(len(json_data.keys())):
                 for key1 in json_data
                    #if(jdata.keys()[i] == json_data.keys()[j]):
                     if(key == key1)
                        if done:
                            count  = 0
                        done = False                    
                        if(json_data[key1] < jdata.key]):                           
                            count = count + 1
                            break
                result_data.append(pd.Series([str(key),count,len(df.index),rowcount,df['D_code'].ix[rowcount],df['t_code'].ix[rowcount]],index = ['key','lessthan_count','count','rowcount','D_code','t_code']))

As i am using ubuntu i used top to find the memeory usage. It takes around %MEM as 72.8 but its varying.

How can i overcome this problem. I am using 5GB RAM system. Please share your ideas.

EDIT:

This is a json column. This is not fixed it will vary(length).

{"IND_EC_10_A":1.41,"IND_EC_10_C":3.09,"IND_EC_10_D":3.66,"IND_EC_10_F":1.08,"IND_EC_10_G":1.21,"IND_EC_10_H":2.01,"IND_EC_10_I":1.26,"IND_EC_10_J":2.17,"IND_EC_10_K":1.63,"IND_EC_10_L":12.47,"IND_EC_10_M":3.42,"IND_EC_10_N":1.70,"IND_EC_10_O":1.35}

EDIT:

Instead of looping the JSON i tried to separate the key and value as columns and from that i can find the percentile value. I followed this link Pandas expand json field across records But its not working for me.

def json_to_series(text):
    keys, values = zip(*[item for dct in json.loads(text) for item in dct])
    return pd.Series(values, index=keys) 
result = df["json col name"].apply(json_to_series)

I got the error as ValueError: need more than 1 value to unpack..

Can we see a sample of the data? This code looks awfully complex for what you're describing... — James Mills, Apr 30 '15 at 05:37
As a side note, instead of doing `for i in range(len(jdata.keys()):` and then using `jdata.keys()[i]` over and over, just do `for key in jdata:`. If you need `i` as well as `key`, you can do `for i, key in enumerate(jdata):`, but it doesn't look like you need it anywhere. — abarnert, Apr 30 '15 at 05:49
Meanwhile, what's the `str(jdata.keys()[i])` for? Are you storing a dict as its string representation or something? If so, why? — abarnert, Apr 30 '15 at 05:51
You are creating a lot of copies of keys / values (each call to `.keys()` or `values()` creates new list. Also `for i in range(len(jdata.keys())):` creates unneeded list of integers. Have you considered using `itertools` module? Lazy evaluation is usually better idea than keeping everything in memory multiple times. — Łukasz Rogalski, Apr 30 '15 at 05:52
@Nsh: As much as I love `itertools`, there's no need for it here. If this is Python 3, those are already all lazy; if it's Python 2, he can just use `xrange` and `iterkeys`/`itervalues` (or, for 2.7, `viewkeys`/`viewvalues`). Or, as I said above, just use `for key in jdata:` in the first place and not bother with trying to get it right or optimize it… — abarnert, Apr 30 '15 at 05:54
In the new version, you've only half-way changed it; you've still got some `jdata.keys()[i]` lying around, but you no longer have anything named `i`, so that's just going to raise an exception. — abarnert, Apr 30 '15 at 06:04
When you say "json column", do you mean that the actual json is a list of these? At the moment I don't see how this should make a DataFrame... — Andy Hayden, Apr 30 '15 at 06:09
Still wrong. It's not `jdata.keys()[key]`, just `key`. And if you need the values, you can use `jdata[key]` (it is a `dict`, after all)`, or you can change the loop to `for key, value in jdata.items():` (or `viewitems` for 2.7 or `iteritems` for 2.6, for the reason Nsh pointed out). — abarnert, Apr 30 '15 at 06:11
No, not `jdata.key` either. Just `key`. Maybe it'll help to think of something simpler. If you have `x = [5, 6, 7]`, and you do `for i in x`, the values of `i` are `5`, `6`, and `7`. You don't want `x[5]` or `x.5` or anything like that, just `5`. — abarnert, Apr 30 '15 at 06:24
At least in the line `if(json_data[key1] < jdata.key]):`. Which is also a syntax error because of the stray `]`. At any rate, I can't iteratively debug your code for you through StackOverflow comments like this. — abarnert, Apr 30 '15 at 06:59

Memory issue in python pandas

0 Answers0