0

I am reading in a buch of files from a directory with pandas.read_csv. Some files are have flaws such as changing column separators.

This makes creating a well-defined dataframe too difficult.

The affected data lines look like:

11:25;;;;;;;;;;;;;17.67;632.52;
11:30;;;;;;;;;;;;;;;
11:35,,,,,,,,,,,,,,,
11:40,,,,,,,,,,,,,18.18,633.53

I tried to skip these lines with error_bad_lines

Here is how I read in the data:

df = pd.read_csv(file_path, 
                        sep=sep, 
                        skiprows=skiprows, 
                        usecols = usecols,
                        parse_dates = parse_dates,                        
                        error_bad_lines = True,
                        warn_bad_lines = True,
                        )

I still receive an error on my date functions and the resulting dataframe looks like:

65                                                11:10
66                                                11:15
67                                                11:20
68                                                11:25
69                                                11:30
70    11:35,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
71    11:40,,,,,,,,,,,,,18.18,633.53,11519,18.18,6.0...
72    11:45,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
73    11:50,,,,,,,,,,,,,18.55,626.05,11611,18.55,6.0...
74    11:55,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

I want to achieve an output as for the lines 65-96 above. The lines 70 and following above are wrong.

How can I solve this issue?

DaCoEx
  • 303
  • 1
  • 7
  • 2
    Try `sep=r'[,;]+', engine='python', header=None, usecols=[0]` – cs95 Jun 06 '19 at 17:45
  • Thanks a lot for (1) swift & (2) helpful response! could you explain for my learning why I need: `r` and also why `+` BTW, the solution also works with `header=0, ` – DaCoEx Jun 07 '19 at 14:29

0 Answers0