this is my first time asked a question and English is not my language so if I wrote something wrong please forgive me. I just crawled the scripts from the websites and calculated the TF-IDF of the features and I want to save the result into the CSV file with all the rows and columns. Thank you for your help!
import pandas as pd
import nltk
import csv
from sklearn.feature_extraction.text import TfidfVectorizer
data = pd.read_csv("script.csv", header=None)
data.columns = ['website','script']
tfidf2 = TfidfVectorizer(min_df=5,max_df= 0.9,max_features=3000,sublinear_tf=True)
X = tfidf2.fit_transform(data['script'])
df = pd.DataFrame(X.toarray(), columns=tfidf2.get_feature_names())
print(df)
with open ("tf_idf.csv",'a', newline='') as file:
writer = csv.writer(file)
writer.writerow([df])
file.close()
Here the result:
10 12 14 ... undefined url width
0 0.109124 0.184763 0.109124 ... 0.229009 0.000000 0.182132
1 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.000000 ... 0.000000 0.146687 0.000000
3 0.186309 0.000000 0.088777 ... 0.000000 0.000000 0.070605
4 0.000000 0.000000 0.000000 ... 0.000000 0.078447 0.000000
5 0.000000 0.145435 0.000000 ... 0.000000 0.226503 0.195839
6 0.000000 0.000000 0.000000 ... 0.125661 0.157894 0.099939
7 0.109124 0.184763 0.109124 ... 0.229009 0.000000 0.182132
8 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
9 0.000000 0.000000 0.000000 ... 0.000000 0.145549 0.000000
10 0.185179 0.000000 0.088239 ... 0.000000 0.000000 0.070177
11 0.000000 0.000000 0.000000 ... 0.000000 0.078447 0.000000
12 0.000000 0.145435 0.000000 ... 0.000000 0.226503 0.195839
13 0.000000 0.000000 0.000000 ... 0.125661 0.157894 0.099939
14 0.228102 0.108692 0.184031 ... 0.283624 0.136572 0.000000
[15 rows x 80 columns]
[Finished in 5.8s]