I have a dataframe displaying item_number, quantity_picked, and date_expected and would like to add a new column and fill it in automatically with the day of the week corresponding with the date (large dataset, cannot label individually).
I have tried making sure the queried data comes over as a date format, but am unsure if it is doing so successfully. It provides no errors but still lists the column as an "object". I have also tried using dataframe.dt.datetime and dataframe.dt.day_name to accomplish this to no avail.
I have tried accomplishing this by starting the query both ways shown below:
SQL = ('SELECT item_number AS UPC, quantity_picked, date_expec AS date_expected FROM [Data] ORDER BY [date_expected] ASC')
SQL = ('SELECT item_number AS UPC, quantity_picked, CAST(date_expec AS date) AS date_expected FROM [Data] ORDER BY [date_expected] ASC')
I have tried every combination of the two above and the two below attempts to adding the new column with days of the week to the dataframe:
practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)
practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)
As another attempt, I stripped down the second piece of the code one line at a time and discovered removing the parse_dates piece from the line turning the query results into a dataframe and all other lines allowed the code to run without error. I then tried the following...
practice_df = pd.read_sql_query(SQL, con=sql_conn)
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df.append(practice_df['date_num'].dt.day_name())
print(practice_df)
I tried coming up with a solution on my own after researching the pd.read_sql_query and series.dt.datetime documentation, and looking to the following posted and answered questions for guidance:
How does parse_dates work with pd.read_sql_query
Create a day-of-week column in a Pandas dataframe using Python
When either query options and the second dataframe option, I get an error message
File "...anaconda3\lib\site-packages\numpy\core\shape_base.py", line 283, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
MemoryError
Using the first option while creating the dataframe and adding the new column the data prints as:
UPC quantity_picked date_expected day_of_week
0 0001111085148 1.0 NaT NaN
1 0001111086984 1.0 NaT NaN
2 0001111088636 1.0 NaT NaN
3 0001111097045 1.0 NaT NaN
4 0001450002690 1.0 NaT NaN
5 0001600012479 1.0 NaT NaN
6 0003800019891 1.0 NaT NaN
7 0004450034115 1.0 NaT NaN
8 0005100021165 1.0 NaT NaN
When I tried the last query to dataframe piece listed as attempted above, I received the following error:
File
"...lib\site-packages\pandas\core\internals\managers.py", line 1325, in _make_na_block
block_values = np.empty(block_shape, dtype=dtype)
MemoryError
Is there a simpler way I should be going about this or something I am missing? Any guidance is greatly appreciated.