0

I have a big matrix import for one .csv more tan 50.000 lines.

I am working with panda and numpy, the matrix is a film data base, I would like to add a new conditional column.

One of the matrix column is genres, is one string with diferentes genres, I want to create a new colum call "Drama_yes_or_no" with one conditional evaluating the column, if the column contains "Drama" in the string write 1.

I am trying with this code but I have this error. ("argument of type 'float' is not iterable", u'occurred at index 424')

def dram_genres(passenger):
    original_title, genres = passenger
    #if genres.find('Drama') != -1:
    if "Drama" in genres: 
        return 'Drama'
    else:
        return 'Not Drama'


# adds new column to dataframe specifying if the film is good/bad
IMDb_data['Drama_or_not'] = IMDb_data[['original_title', 'genres']].apply(dram_genres, axis=1)

IMDb_data[['original_title', 'genres', 'budget','vote_average','Drama_or_not']].head(7)

could you help me please?

Thanks in advance

meskone
  • 15
  • 1

1 Answers1

0

If I understand you correctly you can do the same code with pandas str processing methods:

df = pd.DataFrame({'genre':['Action', 'Drama', 'Drama ', 
                             ' Drama', 'Western', 'Other Drama', 10]})

df['Drama_or_not'] = df['genre'].str.find('Drama')>0

This should address your error as well:

"argument of type 'float' is not iterable".

This error arises in your fourth line, I imagine, because genres is a float rather than an iterable object (e.g., strings or lists).

You should be careful though, if you have float values in a column which is meant to be only for strings - you should preferentially clean up and examine the data first so you understand why this is the case.

FChm
  • 2,515
  • 1
  • 17
  • 37
  • Thanks, it's works!! and If you want to add to conditions releated with two differents columns? IMDb_data['Drama_or_not'] = (IMDb_data['genres'].str.find('Drama')>0 and IMDb_data['cast'].str.find('Robert De Niro')>0) – meskone Feb 25 '19 at 12:40
  • You can use the `&` binary operator for this - see [this SO post](https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas). For example `df['new_column'] = df[(df['first_column']==value) & (df['second_column']==other_value)]` – FChm Feb 25 '19 at 13:55