Each of my research data files "*.dat" has up to 2000000 data lines. The column number of each data line may be different from each other. Below is an example.
FRAM_# 0 0(fs) CN= 1 PRMRYTGT 14689 H 15449 O 1.008
FRAM_# 1100 275(fs) CN= 2 PRMRYTGT 14689 H 17402 O 1.257 15449 O 1.430
FRAM_# 303200 75800(fs) CN= 0 PRMRYTGT_BD 14689 H
FRAM_# 921200 230300(fs) CN= 1 PRMRYTGT_BD 14689 H 8375 O 1.062
FRAM_# 1078700 269675(fs) CN= 1 PRMRYTGT_BD 14689 H 12971 O 1.507
FRAM_# 18203400 4550850(fs) CN= 1 PRMRYTGT_BD 14689 H 16172 O 1.507
Each column is separated by "". How can I read data like above using Panda or Scipy or any other powerful modules? In addition, it might exist duplicated data. If it is, how can I filter those duplicated data? Any further suggestion would be highly appreciated.