I have a info.txt file it looks like this:
B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000 00000100
You can tell tat the first 3 rows have 10 columns but the forth rows has 11 column,so when I read thsi file:
import pandas as pd
import numpy as np
df =pd.read_csv('C:\Users\Petter\Desktop\info.txt',sep=r"\s+", header=None, dtype=str, engine="python")
df
I get this and an error:
0 1 2 3 4 5 6 7 8 9
0 B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
1 B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
2 B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
Ideally it should automatically add one more column to the df. The output should looks like:
0 1 2 3 4 5 6 7 8 9 10
0 B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
1 B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
2 B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
I tried:
df = pd.DataFrame(pd.np.empty((0, 11)))
But it not work.