Concatenating 2 rows into 1 row and set it to "columns"

Question

I've encountered the following problem. I have data that's formatted nicely for the human eye but is terrible for a computer to read. E.g., it has a table format but without separators and its headers are word-wrapped, meaning that if it's longer than a specific length, the rest is written on a new line.

column 1        column 2 but          column 3
                with more text
1.5893001       1.513451              1.198420
1.5893001       1.513451              1.198420
1.5893001       1.513451              1.198420
1.5893001       1.513451              1.198420
etc...

I want the columns in a DataFrame to have the text that this header has. I'm already annoyed by the fact that I have to preprocess the data by manually converting it to a dict.

After loading this data into a DataFrame, the DataFrame's first row is the column labels where column2 is ofcourse only partially read. In the second row, the first value (of column 1) has the value of '', the second has 'with more text' and the third has NaN.

I've tried df.columns = df.iloc[0] + ' ' + df.iloc[1] which results in the second column having the correct label, but the others having nan.

Is there a way to make a conditional expression?

[edit] @jezrael for the correct formatting

{0: {0: 'column 1', 1: '', 2: 1.5893001, 3: 1.5893001}
 1: {0: 'column 2', 1: 'with more text', 2: 1.513451, 3: 1.513451}
 2: {0: 'column 3', 1: None, 2: 1.198420, 3: 1.198420}}

Also maybe better is upload some data sample file to dropbox, gdocs, because reading header is data dependent... Hard to verify with copy from text — jezrael, Mar 15 '19 at 09:43
If add parameter `header=[0,1]` to `read_csv` it working or error? — jezrael, Mar 15 '19 at 09:49
@jezrael I'd have to write this df to a tmp csv file and reread it into a DataFrame. I'll report my findings — Alb, Mar 15 '19 at 09:50
Perhaps just use StringIO instead of creating a tmp csv file? https://stackoverflow.com/questions/22604564/create-pandas-dataframe-from-a-string — Porz, Mar 15 '19 at 09:53

jezrael · Accepted Answer · 2019-03-15T10:03:42.497

1

You can replace misisng values by Series.fillna adn remove traling spaces by str.strip, then remove first 2 rows with iloc:

df.columns = (df.iloc[0] + ' ' + df.iloc[1].fillna('')).str.strip()
df = df.iloc[2:].reset_index(drop=True)
print (df)
  column 1 column 2 with more text column 3
0   1.5893                 1.51345  1.19842
1   1.5893                 1.51345  1.19842

edited Mar 15 '19 at 10:03

answered Mar 15 '19 at 09:58

jezrael

822,522
95
1,334
1,252

Concatenating 2 rows into 1 row and set it to "columns"

1 Answers1