Hi I am working on a data transforming project. I am taking in a csv
that has 1 million records and trying to segregate them into individual txt
files. The problem is that it takes a lot of time to process. We're talking more that 5 mins for each column here. My code is below:
import pandas as pd
print("Reading CSV")
data_set = pd.read_csv(address_file_path, low_memory=False, index_col=1)
print("Reading Completed")
a_name = set(data_set.loc[:, 'A'])
print("A done")
b_name = set(data_set.loc[:, 'B'])
print("B Done")
c_name = set(data_set.loc[:, 'C'])
print("C Done")
d_name = set(data_set.loc[:, 'D'])
print("D done")
e_name = set(data_set.loc[:, 'E'])
print("E done")
f_name = set(data_set.loc[:, 'F'])
print("F done")
print("Data Transformed")
It does A quite quickly considering that the Pandas.Series
has 1 million records but the repetition is such that it turns out to be only 36 entries but then it gets stuck I am not even sure the code finishes since I haven't seen it finish uptil now.
How can I optimise it to work faster?
Unnamed: 0 1
A NaN
B Job Applicant;UP
C USA
D California
E Alice neh tratma
F Bharuhana
I NaN
J NaN
K SH 5
L NaN
M NaN
N NaN
O NaN
P NaN
Q NaN