-1

I am very new to python,and have started working on text data.

I want add a column in the dataframe, compare it with a condition mentioned in a different column and fill it accordingly.

The dataset was of 10000 rows, I shortened it by taking out random sample of 2000 rows.

I want to include new column named " Review Sentiment " and fill the cells in it as 1 if review.rating is >3 and 0 if review.rating is =< 3.

Here is what I have tried.

Code:

Dataset = pd.read_csv('Datafiniti_Hotel_Reviews.csv')

Dataset_sample = Dataset.sample(n = 2000)
Dataset_sample.head()

i=0

for i in range(len(Dataset_sample.axes[0])):
            if(Dataset_sample['reviews.rating'] < 3):
                Dataset_sample.insert(len(Dataset_sample.axes[1],"Test",1))
            else:
                Dataset_sample.insert(len(Dataset_sample.axes[1],"Test",0)) 

Error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

enter image description here

Dataset: Extract from the dataset. Kindly help using these columns from the dataset. The logic would remain the same.

 ID   province reviews.rating 
 ----------------------------  
 1    CA             5
 7    ST             4
 3    DL             4
 6    YT             5
 5    JD             1
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sid
  • 1
  • 2
  • 2
    Please post a sample of data which can be copied, not an image. – NYC Coder May 22 '20 at 21:21
  • `Dataset_sample['Test'] = Dataset_sample['reviews.rating'].lt(3).astype(int)`. – Quang Hoang May 22 '20 at 21:21
  • Also, you may want to do `Dataset_sample = Dataset.sample(n=2000).copy()`. – Quang Hoang May 22 '20 at 21:22
  • Please [provide a reproducible copy of the DataFrame with `df.head(10).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246/how-to-provide-a-copy-of-your-dataframe-with-to-clipboard). [Stack Overflow Discourages Screenshots](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors). It is likely the question will be down-voted. You are discouraging assistance because no one wants to retype your data or code, and screenshots are often illegible. – Trenton McKinney May 22 '20 at 21:36
  • I have put a snippet from dataset. Hope that helps. – Sid May 23 '20 at 03:57

1 Answers1

0
import pandas as pd

# Data

dfBuses = pd.DataFrame({'size': [40,30], 'cost': [500,400]},
                      index = ['bus1', 'bus2'], columns=['size','cost']) 

print(dfBuses)

dfBuses['expensive']=[(row['cost']>=450)  for i,row in dfBuses.iterrows()]

print(dfBuses)

gives

      size  cost
bus1    40   500
bus2    30   400
      size  cost  expensive
bus1    40   500       True
bus2    30   400      False
Alex Fleischer
  • 9,276
  • 2
  • 12
  • 15