I've encountered the following problem. I have data that's formatted nicely for the human eye but is terrible for a computer to read. E.g., it has a table format but without separators and its headers are word-wrapped, meaning that if it's longer than a specific length, the rest is written on a new line.
column 1 column 2 but column 3
with more text
1.5893001 1.513451 1.198420
1.5893001 1.513451 1.198420
1.5893001 1.513451 1.198420
1.5893001 1.513451 1.198420
etc...
I want the columns in a DataFrame to have the text that this header has. I'm already annoyed by the fact that I have to preprocess the data by manually converting it to a dict.
After loading this data into a DataFrame, the DataFrame's first row is the column labels where column2 is ofcourse only partially read. In the second row, the first value (of column 1) has the value of '', the second has 'with more text' and the third has NaN.
I've tried df.columns = df.iloc[0] + ' ' + df.iloc[1]
which results in the second column having the correct label, but the others having nan
.
Is there a way to make a conditional expression?
[edit] @jezrael for the correct formatting
{0: {0: 'column 1', 1: '', 2: 1.5893001, 3: 1.5893001}
1: {0: 'column 2', 1: 'with more text', 2: 1.513451, 3: 1.513451}
2: {0: 'column 3', 1: None, 2: 1.198420, 3: 1.198420}}