1

this is my first time asked a question and English is not my language so if I wrote something wrong please forgive me. I just crawled the scripts from the websites and calculated the TF-IDF of the features and I want to save the result into the CSV file with all the rows and columns. Thank you for your help!

import pandas as pd
import nltk
import csv

from sklearn.feature_extraction.text import TfidfVectorizer
data = pd.read_csv("script.csv", header=None)
data.columns = ['website','script']

tfidf2 = TfidfVectorizer(min_df=5,max_df= 0.9,max_features=3000,sublinear_tf=True)
X = tfidf2.fit_transform(data['script'])
df = pd.DataFrame(X.toarray(), columns=tfidf2.get_feature_names())
print(df)
with open ("tf_idf.csv",'a', newline='') as file:
    writer = csv.writer(file)
    writer.writerow([df])
    file.close()

Here the result:

  10        12        14  ...  undefined       url     width
0   0.109124  0.184763  0.109124  ...   0.229009  0.000000  0.182132
1   0.000000  0.000000  0.000000  ...   0.000000  0.000000  0.000000
2   0.000000  0.000000  0.000000  ...   0.000000  0.146687  0.000000
3   0.186309  0.000000  0.088777  ...   0.000000  0.000000  0.070605
4   0.000000  0.000000  0.000000  ...   0.000000  0.078447  0.000000
5   0.000000  0.145435  0.000000  ...   0.000000  0.226503  0.195839
6   0.000000  0.000000  0.000000  ...   0.125661  0.157894  0.099939
7   0.109124  0.184763  0.109124  ...   0.229009  0.000000  0.182132
8   0.000000  0.000000  0.000000  ...   0.000000  0.000000  0.000000
9   0.000000  0.000000  0.000000  ...   0.000000  0.145549  0.000000
10  0.185179  0.000000  0.088239  ...   0.000000  0.000000  0.070177
11  0.000000  0.000000  0.000000  ...   0.000000  0.078447  0.000000
12  0.000000  0.145435  0.000000  ...   0.000000  0.226503  0.195839
13  0.000000  0.000000  0.000000  ...   0.125661  0.157894  0.099939
14  0.228102  0.108692  0.184031  ...   0.283624  0.136572  0.000000

[15 rows x 80 columns]
[Finished in 5.8s]
Nam Phan
  • 11
  • 3
  • 1
    You are not saying what the problem is. I assume that you get a Traceback? Can you provide that? I guess that you will need to extract the rows from your df and write each row instead of the entire frame at once. – bohrax May 12 '20 at 16:12
  • May I ask how can I extract the df to each row? – Nam Phan May 13 '20 at 00:39
  • I have never used pandas, but a Google search found this question: https://stackoverflow.com/questions/16923281/writing-a-pandas-dataframe-to-csv-file – bohrax May 13 '20 at 11:35

0 Answers0