pandas Expected 10 fields in line 153, saw 11,how to add one more column

Question

I have a info.txt file it looks like this:

B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000    00000100

You can tell tat the first 3 rows have 10 columns but the forth rows has 11 column,so when I read thsi file:

import pandas as pd
    import numpy as np
    df =pd.read_csv('C:\Users\Petter\Desktop\info.txt'，sep=r"\s+", header=None, dtype=str, engine="python")
    df

I get this and an error:

    0   1   2   3   4   5   6   7   8   9
0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000

Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

Ideally it should automatically add one more column to the df. The output should looks like:

    0   1   2   3   4   5   6   7   8   9  10
0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000

I tried:

df = pd.DataFrame(pd.np.empty((0, 11)))

But it not work.

score 3 · Accepted Answer · answered Jun 22 '21 at 20:24

3

This works, might fit your needs:

df = pd.read_csv(... names=range(11))

answered Jun 22 '21 at 20:24

jch

3,600
1
15
17

Hi @jch friend,can you help me with this question https://stackoverflow.com/questions/68309137/how-to-cross-checking-2-pandas-dataframes-file-and-use-1-dataframes-value-as-a – William Jul 08 '21 at 22:10

score 1 · Answer 2 · answered Jun 22 '21 at 20:29

1

You can use the error_bad_lines argument to avoid this error.

import pandas as pd
import numpy as np
df = pd.read_csv("C:\Users\Petter\Desktop\info.txt", header=None, delimiter=r"\s+", error_bad_lines=False)
df

answered Jun 22 '21 at 20:29

Raja Wajahat

46
6

I already did,and that will make me skip the error row,but I still need it. – William Jun 22 '21 at 20:30
Ah ok, If you need that row then you need to specifically mention the number of columns to read using the range. – Raja Wajahat Jun 22 '21 at 20:41
Hi @Raja friend,can you help me with this question https://stackoverflow.com/questions/68309137/how-to-cross-checking-2-pandas-dataframes-file-and-use-1-dataframes-value-as-a – William Jul 08 '21 at 22:10

pandas Expected 10 fields in line 153, saw 11,how to add one more column

2 Answers2

Linked