I wrote a code to modify some csv source files. In this case, I try to process two of them. First have a 330mb, second 776mb. In jupyter notebook, I got 2 result files in a good format, but when I try to run the script in Windows cmd only first file is creating, script process second file (i know it coz after first it not shut down for a while) but not create a second file...
My code:
for i in range(lenList):
data = pd.read_csv(cwd+file_list[i])
content_value = data[data['[Header]'].str.contains("Content")]
data.columns = ['Header']
list(data)
row_skipped = data.loc[data['Header'] == '[Data]']
row_skipped = row_skipped.index
row_skipped_value = row_skipped[0]+2
ContentVal = content_value.squeeze()
concon = ContentVal.split('Content',)
concon = ''.join(concon)
if concon[0] == "\t":
concon = concon[2:]
else:
concon = concon[1:]
#Deleting unusing rows from DataFrame
data_skipped = pd.read_csv(cwd + file_list[i], sep='\t', skiprows = row_skipped_value, header = 0, index_col = False)
#Pick only a important for program columns
fixed_data = data_skipped[['Name', 'ID', 'A', 'AB']]
fixed_data = fixed_data.loc[(fixed_data['A'] != gap) | (fixed_data['B'] != gap)]
#Creating CSV file from fixed DF
fileAppendName = concon + ".csv"
fixed_data.to_csv(fileAppendName, mode='a', header=False, index = False)
#FREQ File Create
name = fixed_data['Sample ID'].unique()
number = fixed_data.shape[0]
temp_list = pd.DataFrame(
{'ids': name,
'nums': number,
})
fileAppendName1 = concon + "FREQ.FREQ"
temp_list.to_csv(fileAppendName1, mode='a', header=False, index = False)
CSV files look like:
trahs_col Name ID A B trahs_col
trahs_col Name1 ID1 A1 B1 trash_col
trahs_col Name2 ID2 A2 B2 trash_col
....
Any advice why it works by Jupiter but not standing alone?
EDIT: problem with second file is MemoryError. I still have 10GB free RAM, and it looks like problem with:
content_value = data[data['[Header]'].str.contains("Content")]
SOLVED:
I got a solution. The problem was in python. After a reinstall, all works fine. And after this, I notice that I load 3 times same csv
. I refactored code to one reading.