I imported two datasets in my notebook. One which consists of the names that were given to baby's for each state in each year (and the number, [count] of it). The other one consists of the total number of people in the U.S with that name with that specific name. (see added images).
The problem here is that I appended the two datasets, so that they became one dataset, since merging and concat didn't work.
I want to create a column 'Totalcount' next to all the columns where the state is also mentioned (not NAN), so that I can compare the babynames to the national count of that specific name for each state.
For example: in 1934, 4 baby's have been named 'Anna' in Texas. The number of people named 'Anna' in the U.S were 14000 in 1934. The number of 'totalcount' should be 14000 then. This number should be 14000 for next to the other states as well. So: 5 baby's have been named 'Anna' in California, and total count would still be 14000. I have tried many things but I can't figure it out.
example states dataset:
df = pd.DataFrame([['AF',1910.0, 'F', 'Mary', 14.0], ['TX',1910.0, 'M' , 'Lex', 15.0], ['CA',1920.0,'M','Tom', 14.0]], columns=['State', 'Sex','Year','Name','Count'])
example national dataset:
df2 = pd.DataFrame([['Anna','F', 2402,1910], ['Ben','M',5044,1912], ['Simon','M',39203,1910]], columns=['Name', 'Sex','Count','Year'])
df2