What function and parameters are available in Pandas in order to open a tab delimited text file?

Question

I have a text file as follows:

   Movie_names Rating
      "A"         10
      "B"         6.5

The text file is tab delimited. Some movie titles are enclosed in a double quote. How to read it into a pandas dataframe with the quotes removed from the movie names?

I tried using the following code:

import pandas as pd
data = pd.read_csv("movie.txt")

However, it says there is a Unicode decode error. What should be done?

I get a whole range of errors. It ends with "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte" and it is NOT a file with a csv extension. It has a .txt extension. — Mainul Islam, Oct 11 '16 at 21:01
[How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) ;) — MaxU - stand with Ukraine, Oct 11 '16 at 21:02
To solve the `UnicodeDecodeError` you need to specify the encoding while loading (see my answer below). — mfitzp, Oct 11 '16 at 21:25
The encoding error was showing up because the original text file was saved with ANSI encoding. I saved it again with UTF-8 encoding and the problem was solved. — Mainul Islam, Dec 01 '16 at 07:39

mfitzp · Accepted Answer · 2016-10-11T21:30:11.483

First you can read tab delimited files using either read_table or read_csv. The former uses tab delimiter by default, for the latter you need to specify it:

import pandas as pd
df = pd.read_csv('yourfile.txt', sep='\t')

Or:

import pandas as pd
df = pd.read_table('yourfile.txt')

If you are receiving encoding errors it is because read_table doesn't understand the text encoding of the file. You can solve this by specifying the encoding directly, for example for UTF8:

import pandas as pd
df = pd.read_table('yourfile.txt', encoding='utf8')

If you file is using a different encoding, you will need to specify that instead.

score 0 · Answer 2 · answered Oct 11 '16 at 20:50

0

First you'll want to import pandas

Df = pandas.read_csv("file.csv")

Get rid of double quotes with

Df2 = Df['columnwithquotes'].apply(lambda x: x.replace('"', ''))

answered Oct 11 '16 at 20:50

Mpark

11
1

I get a whole range of errors. It ends with "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte" and it is NOT a file with a csv extension. It has a .txt extension. – Mainul Islam Oct 11 '16 at 21:04
I am using python 3 so that may be reason for the Unicode error. I believe csvreader has ability to read text files and covert to CSV first. – Mpark Oct 11 '16 at 21:19

Romain · Answer 3 · 2016-10-11T21:18:44.613

You can use read_table as its quotechar parameter is set to '"' by default and will so remove the double quotes.

import pandas as pd
from io import StringIO

the_data = """
A   B   C   D
ABC 2016-6-9 0:00   95  "foo foo"
ABC 2016-6-10 0:00  0   "bar bar"
"""
df = pd.read_table(StringIO(the_data))
print(df)

#      A               B   C        D
# 0  ABC   2016-6-9 0:00  95  foo foo
# 1  ABC  2016-6-10 0:00   0  bar bar

What function and parameters are available in Pandas in order to open a tab delimited text file?

3 Answers3