0

Trying to implement a code that analyzes a dataframe row per row. Lookts at the sentence and applies bag of words approach to create new columns to be used as features for regression analysis.

Here's what I'm trying to replicate and have done successfully but am having a hard time making sure they are aligned on the row of the dataframe that i used apply to.

take a look at this sample:

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

##func im trying to create
def per_row(row):
    corpus = [row['bb']]
    index = row.index.values
    bag_of_words = vectorizer.fit_transform(corpus)
    bag_of_words.toarray()
    feature_names = vectorizer.get_feature_names()
    display(pd.DataFrame(bag_of_words.toarray(), columns=feature_names))
    ##print(corpus,type(corpus))
    #return pd.DataFrame(bag_of_words.toarray(), columns=feature_names)

# Create data frame
display('initial df',a)

a = pd.DataFrame([['a','Fast_Food,Budget_Friendly,Pasta'],
                  ['b','Fast_Food,Asean,Pasta']
                 ],columns=['aa','bb'])
#so far this is the approach i can think of to add new columns 
#but how can i achieve it to be dynamic in a sense 
#that the df output is joined on original df (a)

a.apply(per_row,axis=1)


# this is my desired outcome after the script runs. 
#the classification per row is moved as 
#dummy variables/features for use in regression

desired_outcome = pd.DataFrame([['a','Fast_Food,Budget_Friendly,Pasta',1,1,1,0],
                                ['b','Fast_Food,Asean,Pasta',0,1,1,1]
                               ],
                                 columns=['aa','bb','budget_friendly','fast_food','pasta','asean'])

desired_outcome

Need help fixing per_row function so that it joins any new feature created by the bag of words vectorizer.

If theres a package that can perform the desired process, it will also be preferred. thanks in advance.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Kel
  • 51
  • 6
  • Possible duplicate of [Pandas convert a column of list to dummies](https://stackoverflow.com/questions/29034928/pandas-convert-a-column-of-list-to-dummies) – G. Anderson Aug 21 '19 at 18:31
  • Can you add the imports you use and the initial dataframe or a small part of it? Without this, I cannot test your code. – ndclt Aug 21 '19 at 19:49
  • Hi Sorry, yea forgot about adding the packages used. edited the OP. i used pandas and countvectorizer from sklearn – Kel Aug 22 '19 at 02:30

0 Answers0