0

I am trying to read a text file containing the following (around 1 million rows):

First Line: "column_header", "column_header", "column_header", "column_header"

Second line onwards: "value", "value", "value", "value"

I tried the following:

''' try 1 '''
with open(file, 'rt') as f:
    contents = f.readlines()

for i in contents:
    print(i) # ->> seeing the text as ," value ", " value ", "
    x = [_.strip().replace('""', '').split(',') for _ in i]
    print(str(x)) # ->> getting bytez

''' try 2 '''
with open(file, 'rt') as f:
    contents = f.read()

    for i in contents:
        print(str(i)) # ->> text but cannot do anything

''' try 3 '''
frame = pd.read_csv(file, sep=',', doublequote=True, skip_blank_lines=True) # ->> utf parsing error
wjandrea
  • 28,235
  • 9
  • 60
  • 81
ko_00
  • 118
  • 7
  • 1
    For try 3 - have you tried passing it an `encoding='...'` where `...` is the encoding of the file? – Jon Clements Mar 12 '20 at 13:19
  • 1
    An UTF-8 parsing error is more a problem with your text file or your setup (OS, environment, shell) than Pandas. We would need to know your text file (or at least the part that fails), and probably the OS & shell you are using. – 9769953 Mar 12 '20 at 13:20
  • use try 3 with " read_fwf" instead of readcsv – Hari Mar 12 '20 at 13:24
  • To be clear, try 3 should work. A UTF error means that there was a problem decoding the text file. Please provide a [mre]. – wjandrea Mar 12 '20 at 13:26
  • Maybe https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python – ChatterOne Mar 12 '20 at 13:31
  • JonClements: yes, both 'python' as 'c' 00: the above could be taken as example. it errors at index 0, first line i assume Hari: tried, but cannot convert it to DF rows wjandrea: see my comment at 00 @AMC: what is the best way to convert a text file (containing values such as shown in my question) to a pd.DataFrame() . For each line -> row, columns are known – ko_00 Mar 12 '20 at 23:44

1 Answers1

0

I found out that the text file which I received did not have an encoding utc-8. Therefore, neither of the above worked. My solution: open and save as .txt (utf8 encoding). than use the following python code:

file = folder_location + 'report.txt'

''' try 3 '''
frame = pd.read_csv(file, sep=',', doublequote=True, skip_blank_lines=True)
print(frame.head())
ko_00
  • 118
  • 7