how to add weights as a vector from lexicon to the tweets?

Question

I have a dataset (Tweets), also I have a lexicon that contains words, each word has 100 columns as a weight.

I want to check if one word in tweets appears in the lexicon, I want to take the weight (100 columns) for this word, and add it to the dataset (tweet) as 100 columns,

note: if they find other words in the tweets that appear in the lexicon, make a summation for all weight.

first, I initialize 100 columns and add them to the dataset beside tweets:

train = pd.read_csv(r"Dataset.csv")
train.sahpe
#(5000,1)
train.head(3)
# Tweet
# joy, fear
# anger, joy
# sadness  

lexicon = pd.read_csv(r"lexicon with PFA.csv")
lexicon.shape
#(10000,101)
lexicon.head(2)
#word  w1  w2  w3 .... w100
#joy   0.5 0.1 0  .... 0.2
#fear  0.2 0   0.3 ... 0.1

# Assign Column - All values initailly 0 # how we can initialized all of them automatically 
train["W1"] = 0
train["W2"] = 0
train["W3"] = 0
train["w4"] = 0
.
.
.
train["w100"] = 0

train.shape
#(5000,101)

def calcExtraFeatureW1(query):
    lexicon_score_W1 = 0
    
    # For each word in Tweet
    for i in query.split(" "):
        try:
            # Search for the weights(W1_W100) values - - If available get its wights values and added to score
            sc1 = lexicon[lexicon["word"] == i]["w1"].values[0] # here, it is work for one column, i want for all 
            lexicon_score_w1 += sc1
        except:
            # May be lexicon not available, just skip
            pass
        
    return lexicon_score_w1



desired output

#Tweet      w1    w2    w3   ... w100
#joy,fear  0.7   0.1    0.3  ..  0.3

#note: in this case, the result of joy and fear calculated

In this case, it takes just the value for one column and adds it to the dataset, but I want the same progress for all columns together.

Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). — Henry Ecker, May 24 '21 at 14:38
Why are the lengths of `train` and `lexicon` different (5000 and 10000)? — kd88, May 24 '21 at 14:39
I would suggest avoiding using the term "dictionary" to describe the `lexicon` dataframe - it is slightly misleading to the reader — kd88, May 24 '21 at 14:40

score 0 · Answer 1 · answered May 24 '21 at 14:33

0

I want to check if one word in tweets appears in the dictionar

you can check if an item is in a dictionary using the in keyword,

lexicon[“word”] in train.keys()

answered May 24 '21 at 14:33

There is some confusion in the OP's question, since `train` is a `pandas.Dataframe` rather than a `dict` – kd88 May 24 '21 at 14:35
@Jared, i already check the words `sc1 = lexicon[lexicon["word"] == i]["w1"].values[0]`, but I need to retrieve all weights (100) – Bashar May 24 '21 at 14:37

how to add weights as a vector from lexicon to the tweets?

1 Answers1