-2

I am not sure where I went wrong with my below code, where I used two for loops to firstly iterate statename and then iterate each dictionary that contains that specific statename.

I finally resolved this via my second code (the right code on the snip) however would be keen to know why the first didn't work.

The file used is a census file with statename, countyname (a subdivision of the state) and population being the columns.

Couldn't work with the following snip (on the left) where the error is 'string indices must be integers':

enter image description here

Ted
  • 1,189
  • 8
  • 15
  • 1
    This is not the way to ask a question and most likely your question gets closed. I suggest you spend some time to understand how to ask a good question, this way we can give you good answers. Read [this](https://stackoverflow.com/help/minimal-reproducible-example) and [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Erfan Aug 21 '19 at 15:01
  • What you probably want is: `census_df.groupby('STNAME').cumcount` – Erfan Aug 21 '19 at 15:04

2 Answers2

0

Don't know why the pic is not coming up...sorry first timer here!
the first code that I tried which I have questions over are: (regarding string indices must be integers):

state_unique=census_df['STNAME'].unique()
list=[]
def answer_five():
    for c in state_unique:
        count=0
        for d in census_df:
            if d['STNAME']==c:
               count+=1
        return list.append(count)
answer_five()

The second code helped resolve my question is:

max_county=[]
state_unique=census_df['STNAME'].unique()
def answer_five():
    for c in state_unique:
        df1=census_df[census_df['STNAME']==c]
        max_county.append(len(df1))
    return max(max_county)
answer_five()
0

As others have already suggested, please read up on providing a Minimal, Reproducible Example. Nevertheless, I can see what went wrong here. When you loop through for d in census_df, this actually loops through the column names for your data frame, i.e. SUMLEV, REGION etc. This is presumably not what you had in mind.

Then your next line if d['STNAME']==c causes an error, as the message says, because string indices must be integers. In this instance you are trying to index a string using another string STNAME.

If you really want that first method to work, try using iterrows:

state_unique=census_df['STNAME'].unique()
list=[]
def answer_five():
    for c in state_unique:
        count=0
        for index, row in census_df.iterrows():
            if row['STNAME']==c:
               count+=1
        list.append(count)
    return(max(list))
answer_five()
Ted
  • 1,189
  • 8
  • 15
  • Cool, thanks! i thought for d in census_df, the d here suggests each dictionary as in each row rather than column? I recon if the column i had to iterate over are all integers, such issue won't happen? – Jane Zhang Aug 21 '19 at 15:57
  • @JaneZhang Glad to hear it helped. Feel free to "accept" the answer and upvote if you like. Even if the column name was an integer (or the data in that column were just integers), your `for d in census_df` would still iterate over the columns. In general, using for loops with data frames is a bad idea - getting used to the in-built Pandas functions will be of huge benefit to you in the long term. – Ted Aug 21 '19 at 16:08
  • Cheers Tom! Helped! Agreed-pandas is quite smart. But as a beginner, i somehow tend to bring the mindset of getting through it using the tedious for loop... – Jane Zhang Aug 21 '19 at 16:53