I am learning how to do analysis on a large volume of comments and I asked ChatGPT to generate the code for me.
Then I don't know why my Python can't find some of the modules even though I installed them in my venv. I installed them using pip install pandas nltk scikit-learn gensim
. I am pretty sure that they are properly installed as I checked it with pip list
I am using pyhton v.3.11.4
Here is the error message:
(data_analysis) D:\> py "D:\New folder\Main Program" Traceback (most recent call last): File "D:\New folder\Main Program", line 1, in <module> import pandas as pd ModuleNotFoundError: No module named 'pandas'
And here is my code:
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.sentiment import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from gensim.summarization import summarize
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')
# Load and preprocess data
def preprocess_text(text):
# Tokenize and remove stopwords
words = word_tokenize(text)
words = [word.lower() for word in words if word.isalpha()]
words = [word for word in words if word not in stopwords.words('english')]
return ' '.join(words)
# Load your comments data into a DataFrame (assuming 'comments' column)
data = pd.read_csv('comments.csv')
data['cleaned_comment'] = data['comment'].apply(preprocess_text)
# Sentiment Analysis
sia = SentimentIntensityAnalyzer()
data['sentiment_score'] = data['cleaned_comment'].apply(lambda x: sia.polarity_scores(x)['compound'])
# Topic Modeling using LDA
vectorizer = CountVectorizer(max_df=0.8, min_df=2, stop_words='english')
doc_term_matrix = vectorizer.fit_transform(data['cleaned_comment'])
lda_model = LatentDirichletAllocation(n_components=5, random_state=42)
lda_model.fit(doc_term_matrix)
data['topic'] = lda_model.transform(doc_term_matrix).argmax(axis=1)
# Extractive Summarization
data['summary'] = data['comment'].apply(lambda x: summarize(x, ratio=0.3))
# Manual Review and Visualization
for index, row in data.iterrows():
print(f"Comment {index+1} - Sentiment: {row['sentiment_score']:.2f}, Topic: {row['topic']}")
print("Original Comment:", row['comment'])
print("Summary:", row['summary'])
print("="*50)
# Save summarized data to a new CSV file
data.to_csv('summarized_comments.csv', index=False)