I have an excel file that includes 10K rows, each row has some information for a tweet. For example these columns: Tweet, Date of Tweet, User Name, Retweet Count, ..., User Location, Sentiment(the value in this column is Positive or negative or neutral), State(the value of this column is 50 states of the USA), Abbreviation (the value of this column is the abbreviation of state such as CA, NJ, NY,..), CountofNegative (this column is empty and I wish to write the number of Negative tweets for each state in this column, so this column will have 50 numbers).
in following you can see the screenshot of this dataset:
Problem: count the number of negative tweets for each state or its abbreviation and write in CountofNegative column. following is my code:
import pandas as pd
file=pd.read_excel("C:/Users/amtol/Desktop/Project/filter.xlsx")
UserLocation= file["User Location"]
Sentiment= file["Sentiment"]
CountofNegative= file["CountofNegative"]
State=file["State"]
Abbreviation= file["Abbreviation"]
for i, (loc,sent) in enumerate(zip(UserLocation, Sentiment)):
count=0
for j, (state, abbr) in enumerate(zip(State, Abbreviation)):
if (loc == state or loc == abbr and sent == "Negative"):
count=count+1
file.loc[j+1,"CountofNegative"]=count
print(CountofNegative)
file.to_excel("C:/Users/amtol/Desktop/Project/filter.xlsx")
there is no error but when the output file is created, the first 24 values for column "CountofNegative" are zeros and the rest of them are ones(which they are not correct answers). Also, I wanted to test the program by print(CountofNegative)
, which still nothing happened. (no output). how can I fix my code?