I am very new to pandas. I am loading data from csv as read_csv to a dataframe.. My CSV file is 328KB with 535 rows and 7 columns, in that 7 columns 1 column is JSON which contains almost 25-30 keys.
My logic is looping that json column and finding the percentile value. But when i run this python program my system is totally get hanged.
for rowcount in df.index:
jdata = json.loads(df[col_nam].ix[rowcount])
for rowindex in df.index:
if (rowcount != rowindex):
json_data = json.loads(df[col_nam].ix[rowindex])
for i in range(len(jdata.keys())):
count = 1
done = True
for j in range(len(json_data.keys())):
if(jdata.keys()[i] == json_data.keys()[j]):
if done:
count = 0
done = False
if(json_data.values()[j] < jdata.values()[i]):
count = count + 1
break
result_data.append(pd.Series([str(jdata.keys()[i]),count,len(df.index),rowcount,df['D_code'].ix[rowcount],df['t_code'].ix[rowcount]],index =
['key','lessthan_count','count','rowcount','D_code','t_code']))
MODIFIED CODE:
for rowcount in df.index:
jdata = json.loads(df[col_nam].ix[rowcount])
for rowindex in df.index:
if (rowcount != rowindex):
json_data = json.loads(df[col_nam].ix[rowindex])
#for i in range(len(jdata.keys())):
for key in jdata
count = 1
done = True
#for j in range(len(json_data.keys())):
for key1 in json_data
#if(jdata.keys()[i] == json_data.keys()[j]):
if(key == key1)
if done:
count = 0
done = False
if(json_data[key1] < jdata.key]):
count = count + 1
break
result_data.append(pd.Series([str(key),count,len(df.index),rowcount,df['D_code'].ix[rowcount],df['t_code'].ix[rowcount]],index = ['key','lessthan_count','count','rowcount','D_code','t_code']))
As i am using ubuntu i used top to find the memeory usage. It takes around %MEM as 72.8 but its varying.
How can i overcome this problem. I am using 5GB RAM system. Please share your ideas.
EDIT:
This is a json column. This is not fixed it will vary(length).
{"IND_EC_10_A":1.41,"IND_EC_10_C":3.09,"IND_EC_10_D":3.66,"IND_EC_10_F":1.08,"IND_EC_10_G":1.21,"IND_EC_10_H":2.01,"IND_EC_10_I":1.26,"IND_EC_10_J":2.17,"IND_EC_10_K":1.63,"IND_EC_10_L":12.47,"IND_EC_10_M":3.42,"IND_EC_10_N":1.70,"IND_EC_10_O":1.35}
EDIT:
Instead of looping the JSON i tried to separate the key and value as columns and from that i can find the percentile value. I followed this link Pandas expand json field across records But its not working for me.
def json_to_series(text):
keys, values = zip(*[item for dct in json.loads(text) for item in dct])
return pd.Series(values, index=keys)
result = df["json col name"].apply(json_to_series)
I got the error as ValueError: need more than 1 value to unpack
..