I have a data frame that I pulled from an SQL Server. The data got parsed incorrectly when converted to .csv and now I have columns that contain the wrong data. I'm trying to move everything back into place using pandas. Specifically, I have a field that should contain a "short description". Some of the descriptions were delimited into separate fields, and I'd like to concatenate them all into the appropriate field. The problem is, some of the fields contain dates that rightly belong there, and I need to skip over them when concatenating.
I've tried to use df.apply()
in multiple different ways, but I can't seem to "skip over" the values that contain pd.Timestamp
data types.
For example:
df_test.apply(lambda x: ' '.join(x) if type(x) != pd.Timestamp else '')
Example df:
df_so_test = pd.DataFrame([[1, 2, 'some description', pd.to_datetime('2019-01-01'), 'some more text', '']
, [2, 3, 'another description', 'some other text', '', pd.to_datetime('2019-01-02')]
, [3, 4, 'a third descirption', '', pd.to_datetime('2019-01-03'), pd.to_datetime('2019-01-04')]]
, columns=['random_col_1','random_col_2', 'short_desc', 'date_1', 'date_2', 'random_col_3'])
Expected output:
df_expected = pd.DataFrame([[1, 2, 'some description some more text', pd.to_datetime('2019-01-01'), '', '']
, [2, 3, 'another description some other text', pd.to_datetime('2019-01-02'), '', '']
, [3, 4, 'a third descirption', pd.to_datetime('2019-01-03'), pd.to_datetime('2019-01-04'), '']]
, columns=['random_col_1','random_col_2', 'short_desc', 'date_1', 'date_2', 'random_col_3'])