1

This doesn't work:

def rator(row):
    if row['country'] == 'Canada':
        row['stars'] = 3
    elif row['points'] >= 95:
        row['stars'] = 3
    elif row['points'] >= 85:
        row['stars'] = 2
    else:
        row['stars'] = 1
    return row

with_stars = reviews.apply(rator, axis='columns')

But this works:

def rator(row):
    if row['country'] == 'Canada':
        return 3
    elif row['points'] >= 95:
        return 3
    elif row['points'] >= 85:
        return 2
    else:
        return 1

with_stars = reviews.apply(rator, axis='columns')

I'm practicing on Kaggle, and reading through their tutorial as well as the documentation. I am a bit confused by the concept.

I understand that the apply() method acts on an entire row of a DataFrame, while map() acts on each element in a column. And that it's supposed to return a DataFrame, while map() returns a Series.

Just not sure how the mechanics work here, since it's not letting me return rows inside the function...

some of the data:

    country description designation points  price   province    region_1    region_2    taster_name taster_twitter_handle   title   variety winery
0   Italy   Aromas include tropical fruit, broom, brimston...   Vulkà Bianco    -1.447138   NaN Sicily & Sardinia   Etna    NaN Kerin O’Keefe   @kerinokeefe    Nicosia 2013 Vulkà Bianco (Etna)    White Blend Nicosia
1   Portugal    This is ripe and fruity, a wine that is smooth...   Avidagos    -1.447138   15.0    Douro   NaN NaN Roger Voss  @vossroger  Quinta dos Avidagos 2011 Avidagos Red (Douro)   Portuguese Red  Quinta dos Avidagos

Index(['country', 'description', 'designation', 'points', 'price', 'province',
       'region_1', 'region_2', 'taster_name', 'taster_twitter_handle', 'title',
       'variety', 'winery'],
      dtype='object')

https://www.kaggle.com/residentmario/summary-functions-and-maps

avnav99
  • 532
  • 5
  • 16
  • 1
    You mention `map` but you don't show a map implementation. Also can you provide a sample of the `reviews` DataFrame to make your code reproducible? – Henry Ecker Dec 04 '21 at 23:52
  • What is the error? –  Dec 04 '21 at 23:53
  • @user17242583 not really sure it never loads...its the last question in the https://www.kaggle.com/learn/pandas 3rd lesson – avnav99 Dec 04 '21 at 23:56
  • @HenryEcker hey thanks for taking the time, i can add something to the question but its probably easier to copy it from kaggle - but ill try to add a couple of rows! – avnav99 Dec 04 '21 at 23:57
  • that link at the bottom links to the tutorial which has the table at the top – avnav99 Dec 05 '21 at 00:00

2 Answers2

1

When you use apply, the function is applied iteratively to each row (or column, depending on the axis parameter). The return value of apply is not a DataFrame but a Series built using the return values of your function. That means that your second piece of code returns the stars rating of each row, which is used to build a new Series. So a better name for storing the return value is star_ratings instead of with_stars.

If you want to append this Series to your original dataframe you can use:

star_ratings = reviews.apply(rator, axis='columns')
reviews['stars'] = star_ratings

or, more succinctly:

reviews['stars'] = reviews.apply(rator, axis='columns')

As for why your first piece of code does not work, it is because you are trying to add a new column: your are not supposed to mutate the passed object. The official docs state:

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported

To better understand the differences between map and apply please see the different responses to this question, as they present many different and correct viewpoints.

azelcer
  • 1,383
  • 1
  • 3
  • 7
0

You shouldn't use apply with a function that modifies the input. You could change your code to this:

def rator(row):
    new_row = row.copy()
    if row['country'] == 'Canada':
        new_row['stars'] = 3
    elif row['points'] >= 95:
        new_row['stars'] = 3
    elif row['points'] >= 85:
        new_row['stars'] = 2
    else:
        new_row['stars'] = 1
    return new_row

with_stars = reviews.apply(rator, axis='columns')

However, it's simpler to just return the column you care about rather than returning an entire dataframe just to change one column. If you write rator to return just one column, but you want to have an entire dataframe, you can do with_stars = reviews.copy() and then with_stars['stars'] = reviews.apply(rator, axis='columns'). Also, if an if branch ends with a return, you can do just if after it rather than elif. You can also simplify your code with cut.

Acccumulation
  • 3,491
  • 1
  • 8
  • 12