3

I have coded to label text data by term aspect then sentiment with vader lexicon. But the result is only output -1 which means negative and 1 which means positive, where there should be 3 classes of positive, negative and neutral.

Here is the code :

import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Define the aspect keywords
system_keywords = ['server', 'bug', 'error', 'sinyal', 'jaringan', 'login', 'update', 
                   'perbaruan', 'loading', 'aplikasi', 'fitur', 'UI/UX' , 'tampilan', 
                   'data', 'otp', 'keamanan']
layanan_keywords = ['customer service', 'cs', 'call center', 'telepon', 'email', 'beli', 
                    'pertalite', 'bbm', 'topup']
transaksi_keywords = ['cash', 'cashless', 'debit', 'tunai', 'scan', 'e-wallet', 
                      'linkaja', 'link', 'bayar', 'ovo', 'transaksi', 'pembayaran', 
                      'cashback', 'struk', 'tunai', 'nontunai']
subsidi_keywords = ['verifikasi', 'data', 'form', 'formulir', 'daftar', 'subsidi', 
                    'pendaftaran', 'subsidi', 'kendaraan', 'formulir', 'stnk', 'ktp', 
                    'nopol', 'no', 'kendaraan', 'nomor', 'polisi', 'foto', 'kendaraan', 
                    'alamat', 'provinsi', 'kota', 'kabupaten', 'kecamatan']
kebermanfaatan_keywords = ['bagus', 'mantap', 'recommend', 'oke', 'mudah', 'berguna', 
                           'membantu', 'simple', 'guna', 'bantu']

# Define a function to label the aspect based on the aspect keywords
def label_aspect(text):
    aspect_labels = [0] * 5 # Initialize all aspect labels to 0
    for i, keywords in enumerate([system_keywords, layanan_keywords, transaksi_keywords, 
           subsidi_keywords, kebermanfaatan_keywords]):
        for keyword in keywords:
            if keyword in text:
                aspect_labels[i] = 1
                break
    return aspect_labels

# Load the data into a DataFrame
data = {'content': ['Sejak menggunakan aplikasi mypertamina beli pertalite jadi lebih simple dan mudah karena aplikasi ini bener bener membantu untuk meringankan penjual dan pembeli recomend bisa bayar pakai tunai atau nontunai mantepp', 
                    'sering ada bug, aplikasi tidak user friendly. bingung dalam menginput data untuk subsidi. tidak ada notifikasi apakah data inputan sudah masuk atau belum. Tolong diperbaiki',
                    'Bagus juga aplikasi, kalo ada promo seperti ini kan para pemakai premium bisa jadi beralih ke pertalite bahkan pertamax. Coba ada promo2 lainnya seperti kerja sama dg situs belanja online ya min. Pertahankan min',
                    'kadang sulit di akses terakhir ada perintah update MyPertamina, saya ikuti, setelah update, jadi sulit masuk seolah data tidak ada, malah QR code tidak bisa muncul, dan belum sempat saya print',
                    'buruk, sudah coba daftar berkali kali tetap gak bisa. Mau beli bbm harus ada barcode, daftar susah ah bukan nya memudahkan rakyat malah tambah mempersulit']}
df = pd.DataFrame(data)

# Utilize nltk VADER to use custom lexicon
vader_lexicon = SentimentIntensityAnalyzer()

# Add the aspect columns to the DataFrame and label them
aspect_labels = df['content'].apply(label_aspect)
df['sistem'], df['layanan'], df['transaksi'], df['pendaftaran subsidi'], df['kebermanfaatan'] = zip(*aspect_labels)

# Apply Vader sentiment analysis to label the aspect columns
for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']:
    df[col] = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x) 
              ['compound'] >= 0.05 and df[col][0] == 1 else (-1 if 
              vader_lexicon.polarity_scores(x)['compound'] <= -0.05 and df[col][0] == 1 
              else 0))

# Display the resulting DataFrame
df

Here is the output enter image description here

The output results are still not correct. As in the example data :

  • "Sejak menggunakan aplikasi mypertamina beli pertalite jadi lebih simple dan mudah karena aplikasi ini bener bener membantu untuk meringankan penjual dan pembeli recomend bisa bayar pakai tunai atau nontunai mantepp". In this sentence there are no words contained in the subsidi_keywords aspect, but the results in the "pendaftaran subsidi" column contain a value of is 1, should contain the value is 0
  • "sering ada bug, aplikasi tidak user friendly. bingung dalam menginput data untuk subsidi. tidak ada notifikasi apakah data inputan sudah masuk atau belum. Tolong diperbaiki". In this sentence there are no words contained in the transaksi_keywords, layanan_keywords, and kebermanfaatan_keywords aspect, but the results in the "transaksi" column, "layanan" column, and "kebermanfaatan" column contain a value of is 1, should contain the value is 0
Nick
  • 138,499
  • 22
  • 57
  • 95
  • I don't have access to Vader, but your code seems to work for me when I replace that with `random.random()*0.1`. I get values of 1,0, and -1 as expected in the output. – Nick Mar 12 '23 at 00:31
  • Note I would recommend replacing your lambda function with a defined one so you can avoid making two calls to `vader_lexicon.polarity_scores(x)` or perhaps rewrite to `0 if df[col] == 0 else (1 if vader_lexicon.polarity_scores(x) >= 0.05 else -1)` – Nick Mar 12 '23 at 00:33
  • sorry nick I still don't understand your answer, can you provide a code – Annisa Lianda Mar 12 '23 at 06:44
  • `df[col] = df['content'].apply(lambda x: 0 if df[col] == 0 else (1 if vader_lexicon.polarity_scores(x) >= 0.05 else -1))` – Nick Mar 12 '23 at 06:48
  • get an error like this ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – Annisa Lianda Mar 12 '23 at 06:54
  • Did you not get that error with your code? – Nick Mar 12 '23 at 22:15
  • i was change older code with code from you but i get an error like this ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(), – Annisa Lianda Mar 15 '23 at 02:32
  • Ah, sorry, I didn't test my code properly. Actually it can be simplified as `df[col] = np.where(df[col] == 0, 0, df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x)['compound'] >= 0.05 else -1))`. The last expression is actually a constant so you could just compute it before the `for` loop e.g. `compound = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x)['compound'] >= 0.05 else -1); for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']: df[col] = np.where(df[col] == 0, 0, compound)` – Nick Mar 15 '23 at 03:06
  • ahhhh thankyou nick its workkk, once again thankyou you very helpful – Annisa Lianda Mar 15 '23 at 03:42
  • Great to hear - I will post as an answer... – Nick Mar 15 '23 at 03:56

1 Answers1

1

Your issue is that you are always using df[col][0] to test against 0 or 1, where you should be using the appropriate row for the content. You can work around that using np.where to do the computation. Note that the result from Vader that you are testing is a constant (doesn't vary per column) so you can compute it outside the loop:

compound = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x)['compound'] >= 0.05 else -1)
for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']:
    df[col] = np.where(df[col] == 0, 0, compound)
Nick
  • 138,499
  • 22
  • 57
  • 95
  • hey again @Nick, if you are not bussy, would you please help me again in this case https://stackoverflow.com/questions/75803977/flask-datatables-not-showing-data? Thankyou – Annisa Lianda Mar 21 '23 at 16:40
  • @AnnisaLianda sorry, I'm not familiar with Flask so I don't think I can help you. – Nick Mar 21 '23 at 22:51