Working with pandas and PyTables. Begin by importing a table from excel containing columns of integers and floats, as well as other columns containing strings and even tuples. There are a limited number of options on the excel import and unfortunately, unlike the csv import process, datatypes must be converted from their inferred types after the import and cannot be specified in the process.
That being said, all non-numeric is apparently imported as unicode text which is incompatible with a later export to HDF5. Is there a simple way to convert all unicode columns (as well as all column headings) to an HDF5 compatible string format?
MORE DETAILS:
>>> metaFrame.head()
ProjectName Company ContactName \
LocationID
935 PCS Petaluma High School Site Testco Test Name
937 PCS Casa Grande High School Testco Test Name
3465 FUSD Fowler High School Testco Test Name
3466 FUSD Sutter Middle School Testco Test Name
3467 FUSD Fremont Elementary School Testco Test Name
Contactemail \
LocationID
935 test.address@email.com
937 test.address@email.com
3465 test.address@email.com
3466 test.address@email.com
3467 test.address@email.com
Link Systemsize(kW) \
LocationID
935 https://internal.testco.com/locations/935/syst... NaN
937 https://internal.testco.com/locations/937/syst... 675.39
3465 https://internal.testco.com/locations/3465/sys... 384.30
3466 https://internal.testco.com/locations/3466/sys... 198.90
3467 https://internal.testco.com/locations/3467/sys... 35.10
SystemCheckStartdate SystemCheckActive \
LocationID
935 2013-10-01 00:00:00 True
937 2013-10-01 00:00:00 True
3465 2013-10-01 00:00:00 True
3466 2013-10-01 00:00:00 True
3467 2013-10-01 00:00:00 True
YTDProductionPriortostartdate NumberofInverters/cktsmonitored \
LocationID
935 NaN NaN
937 NaN NaN
3465 NaN NaN
3466 NaN NaN
3467 NaN NaN
InverterMfg InverterModel \
LocationID
935 PV Powered : PVP260KW NaN
937 PV Powered : PVP260KW NaN
3465 Advanced Energy Industries : Solaron 333kW (31... NaN
3466 PV Powered : PVP260KW NaN
3467 PV Powered : PVP35KW-480 NaN
InverterCECefficiency ModuleMfg Modulemodel \
LocationID
935 97.0 NaN NaN
937 97.0 NaN NaN
3465 97.5 NaN NaN
3466 97.0 NaN NaN
3467 96.0 NaN NaN
Moduleirradiancefactor Moduleirradiancefactorslope \
LocationID
935 NaN NaN
937 NaN NaN
3465 NaN NaN
3466 NaN NaN
3467 NaN NaN
StraightLineIntercept ModuleTemp-PwrDerate MeterDK
LocationID
935 NaN 0.005 3291 ...
937 NaN 0.005 11548 ...
3465 NaN 0.005 19248 ...
3466 NaN 0.005 15846 ...
3467 NaN 0.005 15847 ...
[5 rows x 27 columns]
>>> metaFrame.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 43 entries, 935 to 5844
Data columns (total 27 columns):
ProjectName 43 non-null values
Company 43 non-null values
ContactName 43 non-null values
Contactemail 43 non-null values
Link 43 non-null values
Systemsize(kW) 42 non-null values
SystemCheckStartdate 37 non-null values
SystemCheckActive 43 non-null values
YTDProductionPriortostartdate 0 non-null values
NumberofInverters/cktsmonitored 2 non-null values
InverterMfg 42 non-null values
InverterModel 8 non-null values
InverterCECefficiency 33 non-null values
ModuleMfg 0 non-null values
Modulemodel 0 non-null values
Moduleirradiancefactor 0 non-null values
Moduleirradiancefactorslope 0 non-null values
StraightLineIntercept 0 non-null values
ModuleTemp-PwrDerate 43 non-null values
MeterDK 43 non-null values
Genfieldname 43 non-null values
WSDK 34 non-null values
WSirradianceField 43 non-null values
WSCellTempField 43 non-null values
MiscDerate 1 non-null values
InverterDKs 37 non-null values
Invertergenfields 37 non-null values
dtypes: bool(1), datetime64[ns](1), float64(9), object(16)