1

I wrote a code to modify some csv source files. In this case, I try to process two of them. First have a 330mb, second 776mb. In jupyter notebook, I got 2 result files in a good format, but when I try to run the script in Windows cmd only first file is creating, script process second file (i know it coz after first it not shut down for a while) but not create a second file...

My code:

for i in range(lenList):
  data = pd.read_csv(cwd+file_list[i])
  content_value = data[data['[Header]'].str.contains("Content")]
  data.columns = ['Header']
  list(data)
  row_skipped = data.loc[data['Header'] == '[Data]']
  row_skipped = row_skipped.index
  row_skipped_value = row_skipped[0]+2

  ContentVal = content_value.squeeze()
  concon = ContentVal.split('Content',)
  concon = ''.join(concon)

  if concon[0] == "\t":
      concon = concon[2:]
  else:
      concon = concon[1:]

  #Deleting unusing rows from DataFrame
  data_skipped = pd.read_csv(cwd + file_list[i], sep='\t', skiprows = row_skipped_value, header = 0, index_col = False)
  #Pick only a important for program columns
  fixed_data = data_skipped[['Name', 'ID', 'A', 'AB']]
  fixed_data = fixed_data.loc[(fixed_data['A'] != gap) | (fixed_data['B'] != gap)]

#Creating CSV file from fixed DF


  fileAppendName = concon + ".csv"
  fixed_data.to_csv(fileAppendName, mode='a', header=False, index = False)

  #FREQ File Create

  name = fixed_data['Sample ID'].unique()
  number = fixed_data.shape[0]
  temp_list = pd.DataFrame(
      {'ids': name,
       'nums': number,
      })

  fileAppendName1 = concon + "FREQ.FREQ"
  temp_list.to_csv(fileAppendName1, mode='a', header=False, index = False)

CSV files look like:

trahs_col Name ID A B trahs_col
trahs_col Name1 ID1 A1 B1 trash_col
trahs_col Name2 ID2 A2 B2 trash_col
....

Any advice why it works by Jupiter but not standing alone?

EDIT: problem with second file is MemoryError. I still have 10GB free RAM, and it looks like problem with:

content_value = data[data['[Header]'].str.contains("Content")]

SOLVED: I got a solution. The problem was in python. After a reinstall, all works fine. And after this, I notice that I load 3 times same csv. I refactored code to one reading.

martin
  • 1,145
  • 1
  • 7
  • 24
  • Try to run script against second file only. Is it working? – Alderven Mar 22 '19 at 12:19
  • I run with only second file, and i got MemoryError in this line: content_value = data[data['[Header]'].str.contains("Content")] yesterday I had the same error in another line but it was a problem with logical operation. How can I pass it, can u see where could be a problem?? – martin Mar 22 '19 at 12:47
  • So optimize your script or update your PC – Alderven Mar 22 '19 at 12:49
  • I have 10GB RAM free when script is runing. And Jupyter works fine. why? – martin Mar 22 '19 at 12:51
  • https://stackoverflow.com/questions/5537618/memory-errors-and-list-limits – Alderven Mar 22 '19 at 13:20
  • I read it before I write a post. Didn't help. I got a solution. The problem was in python. After reinstall, all works fine. And after this, I notice than I load 3 times same csv. I refactored code to one read. – martin Mar 24 '19 at 13:33

0 Answers0