Substituting variables when using Dataframes

Question

I am trying to iterate to_datetime formating across multiple columns and create a new column with a prefix. The issue I seem to be having is substituting the Column Header in the to_datetime command. Manually the command below works:-

pipeline['pyCreated_Date'] = pd.to_datetime(pipeline.Created_Date, errors='raise')

But I get a Attribute Error: 'DataFrame' object has no attribute 'dh' when I try to iterate. I have searched for answers and tried various attempts based on Renaming pandas data frame columns using a for loop but I appear to be missing so fundemental information. Here's my most recent code:-

date_header = ['Created_Date', 'End_Date', 'Expected_Book_Date', 'Last_Modified_Date',
               'Start_Date', 'Workspace_Won/Lost_Date', 'pyCreated_Date']
for dh in date_header:
    pipeline['py' + dh.format()] = pd.to_datetime(
               pipeline.dh.format(), errors='raise')

It appears dh is not being recognised as the Error reads:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-121-d00bf0a5a7fd> in <module>()
      3 date_header = ['Created_Date', 'End_Date', 'Expected_Book_Date', 'Last_Modified_Date', 'Start_Date', 'Workspace_Won/Lost_Date']
      4 for dh in date_header:
----> 5     pipeline['py' + dh.format()] = pd.to_datetime(pipeline.dh.format(), errors='raise')

/usr/local/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
   4370             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4371                 return self[name]
-> 4372             return object.__getattribute__(self, name)
   4373 
   4374     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'dh'

What is the correct syntax to achieve this please? Apologies if it's a rookie mistake but I appreciate your support.

Many thanks

UPDATED after ALollz kind help!

Here's what finally worked

for col_name in date_header:
    pipeline['py'+ col_name.format()] = pd.to_datetime(pipeline[col_name], errors='coerce')
print(f"{pipeline['py'+ col_name.format()].value_counts(dropna=False)}")

When you write `pipeline.dh` it is literally looking for the column labeled `'dh'` in your DataFrame (which does not exist, thus the error). Because you instead want to reference the column labeled by the variable stored in `dh` you should use `pipeline[dh].format()` — ALollz, Jul 05 '18 at 13:11
@ALollz Many thanks! I had to drop the .format() here's what finally worked: for col_name in date_header: pipeline['py'+ col_name.format()] = pd.to_datetime(pipeline[col_name], errors='coerce') print(f"{pipeline['py'+ col_name.format()].value_counts(dropna=False)}") — Chunkylump, Jul 05 '18 at 18:58

Substituting variables when using Dataframes

0 Answers0