0

I am trying to create a pandas DataFrame "B" taking into account a row by row reading of another DataFrame "A".

The thing is that I want to fill the cell's values of the new DataFrame"B" counting specific cases that ocurr in the first data set "A".

I cant initializes the DataFrame "B" with all zeros, because I dont know how many rows it will have.

If I dont initializes the cell's values of the DataFrame "B" I get this error

KeyError: "the label ['0'] is not in the [index]"

when I try:

for i in range(len(df_A.index)):

   if (int(df_A.iloc[i][3])) == sec_types_crmc[3]:

      df_B.at["'"+str(i)+"'", 'bin_0'] = df_B.loc["'"+str(i)+"'"]['bin_0'] + 1
Laura
  • 1,192
  • 2
  • 18
  • 36
  • 2
    won't it have the same number of rows as A? Could you give an toy example of two DataFrames that you want? – Andy Hayden Feb 08 '19 at 01:47
  • It does seem that `B` would be the same size as `A`. If not, you could always initialize an array that is larger than you need and trim it when you are done. It is quite computationally expensive to constantly change the size of an array. – busybear Feb 08 '19 at 01:50
  • Probably not the same size because of the if statement, requiring each row to meet some criterion to be included in df_B. Alternatively, could initalize as same size and at the end, df_b.dropna(0, 'all') – linamnt Feb 08 '19 at 01:51
  • It is like @linamnt says. I will try your solution, but the thing is that df_A is very, very big... so I dont want to lose computational power if it is not necesary. – Laura Feb 08 '19 at 02:22

1 Answers1

1
  1. See this post.

You can use df.loc[_not_yet_existing_index_label_] = new_row

Using df.at will raise a KeyError when label does not exist in DataFrame.

  1. OR since adding new rows is quite memory intensive, I would highly suggest this solution which solves your problem though not in the way you asked. (Create a dictionary and instantiate a DataFrame B once you're done iterating over A.)

As @philipzhou mentioned, you can actually use collections.counter which will automatically turn something like this:

print(collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']))
Output:
Counter({'b': 3, 'a': 2, 'c': 1})

where you can imagine each letter here is a row index, adding to the collection and counting as it goes through A. THen follow these instructions to turn that into a df_B. Basically:

df_B = pd.DataFrame.from_dict(a_counter, orient='index').reset_index()

where a_counter is your collection.counter object.

linamnt
  • 1,315
  • 1
  • 12
  • 23