I am trying to run Python code in Microsoft Azure and I am having issues with getting the explode function from the Pandas library to not throw errors.
The code which I have created runs perfectly fine locally on Spyder (with Pandas version 1.4.4) but it does not work when I use Azure to run my code. I know that the error is not occurring because there are rows which have lists of different lengths, as the code does run as desired when I run the code locally on Python with the same data. This issue only exists when I run my code using Azure.
When I try to use the multi-column explode function (which is available for Pandas version 1.3.0 and later), I get the following error: column must be a scalar
If I instead use the pandas.apply function with a custom-defined lambda function to explode only my desired columns (based on this StackOverflow answer), while keeping the rest of the columns as is, I get the following error: cannot import name 'AggFuncType' from 'pandas._typing'
I have included this block of code os.system("pip install pandas==1.4.4")
in the code block which I made on Microsoft Azure, so I am not sure if my code is still somehow using an outdated version of Azure despite the fact that I have explicitly been asking Azure to get Pandas 1.4.4, or if the error is coming from somewhere else within my code.
Edit
I am using the ML Studio in Microsoft Azure to create a machine learning pipeline.
An example of my code (I changed the variable names but the idea is the same) is new_df = new_df.explode(column = ['name', 'id'], ignore_index = True)
. In my code, new_df
is a pandas DataFrame which contains two columns called name
and id
which contain lists, alongside 15 or so other columns which contain single values (I got the data using a web scraper). The name
column contains a list of names, and the id
column contains a list of corresponding IDs. I know for a fact that these lists have the same lengths as this code is able to run without errors locally, and it works as desired.
Edit
Apparently, Microsoft Azure requires you to use Pandas version 1.0.4 in order to utilize the ML Studio in Azure, so the code which I was attempting to use to install a newer version of pandas was actually not doing anything, as Azure automatically forces you to use 1.0.4 instead of the version which you want.