I have read an xls file into Python with pandas using pd.read_excel
I am trying to cleanup my data but I'm way out of my league.
There is a blank line between every record. In the example pic it's excel row 4, 9 and 11.
There is a comments column, designated in the example (see link) as "col_F". Every record has at least one cell that contains text. The person that created this xls file split longer comments into multiple cells.
I would like to concatenate all of the data in col_F for a particular record into one cell.
I will also trim out blank records once I figure out how to properly concatenate col_F.
I am using Python version 3.5.0, numpy 1.12.0 and pandas 0.19.2
Here is what I have so far:
import numpy as np
import pandas as pd
data = pd.read_excel("C:/blah/blahblah/file.xls", header=0, nrows=10000)
df = pd.DataFrame(data)
I appreciate any suggestion or insight!!
Thanks!