1

I have a info.txt file it looks like this:

B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000    00000100

You can tell tat the first 3 rows have 10 columns but the forth rows has 11 column,so when I read thsi file:

import pandas as pd
    import numpy as np
    df =pd.read_csv('C:\Users\Petter\Desktop\info.txt',sep=r"\s+", header=None, dtype=str, engine="python")
    df

I get this and an error:

    0   1   2   3   4   5   6   7   8   9
0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000

Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

Ideally it should automatically add one more column to the df. The output should looks like:

    0   1   2   3   4   5   6   7   8   9  10
0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000

I tried:

df = pd.DataFrame(pd.np.empty((0, 11))) 

But it not work.

William
  • 3,724
  • 9
  • 43
  • 76

2 Answers2

3

This works, might fit your needs:

df = pd.read_csv(... names=range(11))

enter image description here

jch
  • 3,600
  • 1
  • 15
  • 17
  • Hi @jch friend,can you help me with this question https://stackoverflow.com/questions/68309137/how-to-cross-checking-2-pandas-dataframes-file-and-use-1-dataframes-value-as-a – William Jul 08 '21 at 22:10
1

You can use the error_bad_lines argument to avoid this error.

import pandas as pd
import numpy as np
df = pd.read_csv("C:\Users\Petter\Desktop\info.txt", header=None, delimiter=r"\s+", error_bad_lines=False)
df
  • I already did,and that will make me skip the error row,but I still need it. – William Jun 22 '21 at 20:30
  • Ah ok, If you need that row then you need to specifically mention the number of columns to read using the range. – Raja Wajahat Jun 22 '21 at 20:41
  • Hi @Raja friend,can you help me with this question https://stackoverflow.com/questions/68309137/how-to-cross-checking-2-pandas-dataframes-file-and-use-1-dataframes-value-as-a – William Jul 08 '21 at 22:10