I have a 1000 texts each text has 200-1000 words. size of text csv file is about 10 MB. when I vectorize them with this code, the size of output CSV is exceptionally big (2.5 GB). I am not sure what I did wrong. Your help is highly appreciated. Code:
import numpy as np
import pandas as pd
from copy import deepcopy
import glob
from sklearn.feature_extraction.text import TfidfVectorizer
from numpy import savetxt
df = pd.read_csv('data.csv')
#data has two columns: teks and groups
filtered_df = deepcopy(df)
vectorizer = TfidfVectorizer()
vectorizer.fit(filtered_df["teks"])
vector = vectorizer.transform(filtered_df["teks"])
print(vector.shape) # shape (1000, 83000)
savetxt('dataVectorized1.csv', vector.toarray(), delimiter=',')