I have a tab-delimited set of data with some columns not in the correct order, and missing.
Some rows have 12 columns, others have 13 or 14 or 15 columns.
When I look at the data, I find the following types of rows
.... Col_10: 25 Col_11: 23432 Col12: 639142
.... Col_10: 25 Col_12: 25134 Col13: 243344
.... Col_11: 75 Col_13: 79876 Col15: 634534 Col12: 5 Col14: 73453
.... Col_10: 25 Col_11: 32425 Col13: 989423
.... Col_10: 25 Col_11: 23424 Col12: 342421 Col13: 7 Col14: 13424 Col 15: 67
.... Col_10: 95 Col_11: 32121 Col15: 111231
So, each column from Col_10
to Col_15
is labeled such that column_name: value
.
I would like to read this data into a pandas dataframe using
import pandas as pd
df = pd.read_table("fname.dat")
where fname.dat
is the format for the data above.
How do you do this? It appears I need to allocate space for 15 columns, and then parse the data such that if it begins with Col_10
, this value should be put in the 10th column, if Col_11
then in the 11th, etc.
How does one read in such data with pandas?