Ignore delimiters at end of row in Pandas read csv

Question

I have data in CSV files. I am separating the data into columns using a single tab character. Most of the rows just contain one tab character, like this:

A\tB

Some rows contain extra tabs at the end of the row, like this:

A\tB\t\t

Hence, if I do pd.read_csv(filePath, sep='\t'), then I get an error: ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4. That's because some rows contain 4 tabs.

So how can I ignore the tabs at the end of a row, if it contains extra tabs?

Mostly duplicate of [python - Pandas, read CSV ignoring extra commas - Stack Overflow](https://stackoverflow.com/questions/48668125/pandas-read-csv-ignoring-extra-commas) except the separator. — user202729, Dec 08 '21 at 12:10
Specify two extra columns (or just all four) when reading the data (with the `names` argument), then drop the last two columns after having read the dataframe. I *think* (not sure) that lines with just 2 columns will fill up the remaining columns with NaNs/Nones. — 9769953, Dec 08 '21 at 12:11
Does this answer your question? [Pandas, read CSV ignoring extra commas](https://stackoverflow.com/questions/48668125/pandas-read-csv-ignoring-extra-commas) — 9769953, Dec 08 '21 at 12:12
@user202729 Thanks, that seems to be a good duplicate. `usecols` had escaped my attention, until now. — 9769953, Dec 08 '21 at 12:13

score 2 · Answer 1 · answered Dec 08 '21 at 13:16

2

Use io.StringIO to clean file before:

import pandas as pd
import io

with open('data.txt') as table:
    buffer = io.StringIO('\n'.join(line.strip() for line in table))
    df = pd.read_table(buffer, header=None)

Output:

>>> df
   0  1
0  A  B
1  A  B

answered Dec 08 '21 at 13:16

Corralien

109,409
8
28
52

Ignore delimiters at end of row in Pandas read csv

1 Answers1