0

i have two datasets:

-population: shows the population of USA states, organized alphabetically.

-data: has more than 200,000 rows

population.head()

    state       population
0   Alabama     4887871
1   Alaska      737438
2   Arizona     7171646
3   Arkansas    3013825
4   California  39557045

i'm trying to add a new column called "Incidents" from the other data set.

I tried: population['incidents'] = data.state.value_counts().sort_index()

but i'm getting the following result:

    state       population  incidents
0   Alabama     4887871      NaN
1   Alaska      737438       NaN
2   Arizona     7171646      NaN
3   Arkansas    3013825      NaN
4   California  39557045     NaN

what can i do to fix this??

EDIT: data.state.value_counts().sort_index()

Alabama                  5373
Alaska                   1292
Arizona                  2268
Arkansas                 2753
California              15975
Colorado                 3069
Connecticut              2984
Delaware                 1643
District of Columbia     3091
Florida                 14610
Georgia                  8717
````````````````````````
EBBOOO
  • 1
  • 2
  • What is your data looks like can you post small example of that? – Poojan Nov 17 '19 at 20:38
  • Use: `population.merge(data, on='state', how='left')` – Erfan Nov 17 '19 at 21:56
  • @Poojan data is a list of gun incidents in the US. you can find the original data set here: https://www.kaggle.com/jameslko/gun-violence-data – EBBOOO Nov 18 '19 at 00:08
  • @Erfan unfortunately this wouldn't work. i want to do value_counts on data, then add the result to the population DataFrame – EBBOOO Nov 18 '19 at 00:09

2 Answers2

0

If you wanna add a specific column from one dataset to the other dataset you do it like this population['incidents'] = data[['columntoappend']] Your RHS (right hand side ) must be one column which in your case is not. https://www.google.com/amp/s/www.geeksforgeeks.org/adding-new-column-to-existing-dataframe-in-pandas/amp/

  • Hi Tauseef, i'm not trying to merge both data sets, i'm trying to add the result of: """data.state.value_counts().sort_index()""" to population dataFrame. i'm not getting any errors but its filling with NaN only! – EBBOOO Nov 18 '19 at 00:11
0

The way to do this is as follows, provided that your length of your indices are consistent:

population['incidents'] = [x for x in data.state.value_counts().sort_index()]

I can't really explain why your approach results in NaN objects though. In any case, it would be incorrect as well as you're assigning entire series to each row in the population dataset. With the list comprehension, you're assigning one value to each row.

kerwei
  • 1,822
  • 1
  • 13
  • 22