0

I have a large text file around 450 mb. I have read it and out come is as string.

import pandas as pd
import numpy as np

import re    
def readInChunks(fileObj, chunkSize=2048):

    while True:
        data = fileObj.read(chunkSize)
        if not data:
            break
        yield data
result=[]
f = open("textfile.txt")
for chunk in readInChunks(f):
    result.append(chunk)
f.close()    

Result I got is a big string file, let say it result. And result[0] is given below

Alin Deutsch, Mary F. Fernandez, 1998
Alin Deutsch, Daniela Florescu, 1998
Alin Deutsch, Alon Y. Levy, 1998

Now I want this string to converted to dataframe in following way

      c1                c2                    c3
r1   Alin Deutsch      Mary F. Fernandez     1998
r2   Alin Deutsch      Daniela Florescu      1998
Talha Anwar
  • 43
  • 1
  • 5
  • have you tried pandas read_csv() method to read the whole dataset? – anotherone Nov 09 '18 at 11:08
  • yes, but it consume the ram and system goes to "not responding" sate – Talha Anwar Nov 09 '18 at 11:10
  • with the chunksize option enabled? http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking – anotherone Nov 09 '18 at 11:12
  • also see this answer https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – anotherone Nov 09 '18 at 11:14
  • Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of **read_fwf**. But problem is that Its not splitting the data into columns, though I am using delimiter. – Talha Anwar Nov 09 '18 at 12:26
  • You may also want to try setting the option in pandas read_csv(engine : ‘python’). This is more feature complete so may help the tokenizing error – anotherone Nov 09 '18 at 12:51

0 Answers0