0

I have a dataframe displaying item_number, quantity_picked, and date_expected and would like to add a new column and fill it in automatically with the day of the week corresponding with the date (large dataset, cannot label individually).

I have tried making sure the queried data comes over as a date format, but am unsure if it is doing so successfully. It provides no errors but still lists the column as an "object". I have also tried using dataframe.dt.datetime and dataframe.dt.day_name to accomplish this to no avail.

I have tried accomplishing this by starting the query both ways shown below:

SQL = ('SELECT item_number AS UPC, quantity_picked, date_expec AS date_expected FROM [Data] ORDER BY [date_expected] ASC')

SQL = ('SELECT item_number AS UPC, quantity_picked, CAST(date_expec AS date) AS date_expected FROM [Data] ORDER BY [date_expected] ASC')

I have tried every combination of the two above and the two below attempts to adding the new column with days of the week to the dataframe:

practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)
practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)

As another attempt, I stripped down the second piece of the code one line at a time and discovered removing the parse_dates piece from the line turning the query results into a dataframe and all other lines allowed the code to run without error. I then tried the following...

practice_df = pd.read_sql_query(SQL, con=sql_conn)
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df.append(practice_df['date_num'].dt.day_name())
print(practice_df)

I tried coming up with a solution on my own after researching the pd.read_sql_query and series.dt.datetime documentation, and looking to the following posted and answered questions for guidance:

How does parse_dates work with pd.read_sql_query

Create a day-of-week column in a Pandas dataframe using Python

When either query options and the second dataframe option, I get an error message

  File "...anaconda3\lib\site-packages\numpy\core\shape_base.py", line 283, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)

MemoryError

Using the first option while creating the dataframe and adding the new column the data prints as:

                 UPC  quantity_picked date_expected  day_of_week
0      0001111085148              1.0           NaT          NaN
1      0001111086984              1.0           NaT          NaN
2      0001111088636              1.0           NaT          NaN
3      0001111097045              1.0           NaT          NaN
4      0001450002690              1.0           NaT          NaN
5      0001600012479              1.0           NaT          NaN
6      0003800019891              1.0           NaT          NaN
7      0004450034115              1.0           NaT          NaN
8      0005100021165              1.0           NaT          NaN

When I tried the last query to dataframe piece listed as attempted above, I received the following error:

  File 
"...lib\site-packages\pandas\core\internals\managers.py", line 1325, in _make_na_block
    block_values = np.empty(block_shape, dtype=dtype)

MemoryError

Is there a simpler way I should be going about this or something I am missing? Any guidance is greatly appreciated.

Emily Reed
  • 65
  • 10

1 Answers1

0

You could just handle this in SQL Server directly, using DATENAME:

SELECT
    item_number AS UPC,
    quantity_picked,
    date_expec AS date_expected,
    DATENAME(dw, date_expec) AS day_of_week
FROM [Data]
ORDER BY [date_expected]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360