I have a dataset (Tweets), also I have a lexicon that contains words, each word has 100 columns as a weight.
I want to check if one word in tweets appears in the lexicon, I want to take the weight (100 columns) for this word, and add it to the dataset (tweet) as 100 columns,
note: if they find other words in the tweets that appear in the lexicon, make a summation for all weight.
first, I initialize 100 columns and add them to the dataset beside tweets:
train = pd.read_csv(r"Dataset.csv")
train.sahpe
#(5000,1)
train.head(3)
# Tweet
# joy, fear
# anger, joy
# sadness
lexicon = pd.read_csv(r"lexicon with PFA.csv")
lexicon.shape
#(10000,101)
lexicon.head(2)
#word w1 w2 w3 .... w100
#joy 0.5 0.1 0 .... 0.2
#fear 0.2 0 0.3 ... 0.1
# Assign Column - All values initailly 0 # how we can initialized all of them automatically
train["W1"] = 0
train["W2"] = 0
train["W3"] = 0
train["w4"] = 0
.
.
.
train["w100"] = 0
train.shape
#(5000,101)
def calcExtraFeatureW1(query):
lexicon_score_W1 = 0
# For each word in Tweet
for i in query.split(" "):
try:
# Search for the weights(W1_W100) values - - If available get its wights values and added to score
sc1 = lexicon[lexicon["word"] == i]["w1"].values[0] # here, it is work for one column, i want for all
lexicon_score_w1 += sc1
except:
# May be lexicon not available, just skip
pass
return lexicon_score_w1
desired output
#Tweet w1 w2 w3 ... w100
#joy,fear 0.7 0.1 0.3 .. 0.3
#note: in this case, the result of joy and fear calculated
In this case, it takes just the value for one column and adds it to the dataset, but I want the same progress for all columns together.