0

I have a dataframe where one column has a list of zipcodes and the other has property values corresponding to the zipcode. I want to sum up the property values in each row according to the appropriate zipcode.

So, for example:

zip  value
2210 $5,000
2130 $3,000
2210 $2,100
2345 $1,000

I would then add up the values

$5,000 + $2,100  = $7,100 

and reap the total property value of for zipcode 2210 as $7,100.

Any help in this regard will be appreciated

Nayantara Jeyaraj
  • 2,624
  • 7
  • 34
  • 63
dancemc15
  • 598
  • 2
  • 7
  • 21

1 Answers1

1

You need:

df
    zip  value
0  2210   5000
1  2130   3000
2  2210   2100
3  2345   1000

df2 = df.groupby(['zip'])['value'].sum()

df2 
zip    value
2130    3000
2210    7100
2345    1000
Name: value, dtype: int64

You can read more about it here.

Also you will need to remove the $ sign in the column values. For that you can use something along the lines of the following while reading the dataframe initially:

df = pd.read_csv('zip_value.csv', header=0,names=headers,converters={'value': lambda x: float(x.replace('$',''))})

Edit: Changed the code according to comment. To reset the index after groupby use:

df2 = df.groupby(['zip'])['value'].sum().reset_index()

Then to remove a particular column with zip value ,say, 2135 , you need

df3 = df2[df2['zip']!= 2135]
  • Thank you! I have one more question. What if I want to delete specific zip codes and their corresponding values (after summing up everything)? – dancemc15 Sep 29 '16 at 04:07
  • You can use `df3 = df2[df2['zip'] != value_not_needed]` where value_not_needed is the zip code you don't want – Harshavardhan Ramanna Sep 29 '16 at 04:10
  • Sorry I am a bit confused. Let's say I want zip 2130 and its value gone. How do I indicate so with the code you wrote above? – dancemc15 Sep 29 '16 at 04:14
  • `df3 = df2[df2.index != 2130]` – Harshavardhan Ramanna Sep 29 '16 at 04:15
  • Okay. Thank you so much for your help! Very new to pandas – dancemc15 Sep 29 '16 at 04:17
  • I'm sorry. I hope you don't mind me asking one more question. I want to get rid of multiple zip codes. So I think I should use a formula. How do I go about with that? – dancemc15 Sep 29 '16 at 04:24
  • `df3 = df2[ (df2.index != 2130) | (df2.index != 2345) ]` removes all rows where zip equals either 2130 or 2345. You can check it out in the below link http://stackoverflow.com/questions/29017525/deleting-rows-based-on-multiple-conditions-python-pandas – Harshavardhan Ramanna Sep 29 '16 at 04:28