2

I created a sub dataframe (drama_df) based on a criteria in the original dataframe (df). However, I can't access a cell using the typical drama_df['summary'][0] . Instead I get a KeyError: 0. I'm confused since type(drama_df) is a DataFrame. What do I do? Note that df['summary'][0] does indeed return a string.

drama_df = df[df['drama'] > 0]

#Now we generate a lump of text from the summaries
drama_txt = ""
i = 0
while (i < len(drama_df)):
    drama_txt = drama_txt + " " + drama_df['summary'][i]
    i += 1

EDIT Here is an example of df: enter image description here

Here is an example of drama_df: enter image description here

handavidbang
  • 593
  • 1
  • 6
  • 19
  • 2
    Can you please add an example of `df` and `drama_df`? – Marco Feb 22 '18 at 19:43
  • @Marco I uploaded pictures. – handavidbang Feb 22 '18 at 19:49
  • Possible duplicate of [KeyError: 0 when accessing value in pandas series](https://stackoverflow.com/questions/46153647/keyerror-0-when-accessing-value-in-pandas-series) – Georgy Feb 22 '18 at 19:50
  • 2
    Please, for future questions, do not upload pictures but instead share data, e.g. `df[['drama','summary']].head().to_dict()` or copy-paste the result of `print(df[['drama','summary']].head())` – Anton vBR Feb 22 '18 at 20:14

2 Answers2

1

This will solve it for you:

drama_df['summary'].iloc[0]

When you created the subDataFrame you probably left the index 0 behind. So you need to use iloc to get the element by position and not by index name (0).

You can also use .iterrows() or .itertuples() to do this routine: Itertuples is a lot faster, but it is a bit more work to handle if you have a lot of columns

for row in drama_df.iterrows():
    drama_txt = drama_txt + " " + row['summary']

To go faster:

for index, summary in drama_df[['summary']].itertuples():
    drama_txt = drama_txt + " " + summary
joaoavf
  • 1,343
  • 1
  • 12
  • 25
1

Wait a moment here. You are looking for the str.join() operation.

Simply do this:

drama_txt = ' '.join(drama_df['summary'])

Or:

drama_txt = drama_df['summary'].str.cat(sep=' ')
Anton vBR
  • 18,287
  • 5
  • 40
  • 46