9

I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the process.

Consider the following test data:

import polars as pl
import numpy as np
import plotly.express as px

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)

fig = px.bar(df, x='names', y='random')
fig.show()

I would like this code to show the bar chart in a Jupyter notebook but instead it returns an error:

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/frame.py:1483: UserWarning: accessing series as Attribute of a DataFrame is deprecated
  warnings.warn("accessing series as Attribute of a DataFrame is deprecated")

It is possible to transform the Polars data frame to a Pandas data frame with df = df.to_pandas(). Then, it works. However, is there another, simpler and more elegant solution?

fabioklr
  • 430
  • 1
  • 5
  • 13

2 Answers2

10

Yes, no need for converting to a Pandas dataframe. Someone (sa-) has requested supporting a better option here and included a workaround for it.

"The workaround that I use right now is px.line(x=df["a"], y=df["b"]), but it gets unwieldy if the name of the data frame is too big"

For the OP's code example, the approach of specifying the dataframe columns explicitly works.
I find in addition to specifying the dataframe columns with px.bar(x=df["names"], y=df["random"]) - or - px.bar(df, x=df["names"], y=df["random"]), casting to a list can also work:

import polars as pl
import numpy as np
import plotly.express as px

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)

px.bar(df, x=list(df["names"]), y=list(df["random"]))

Knowing polars better, you may see some other options once you see the idea of the workaround.

The example posted there is simpler, instead of px.line(df, x="a", y="b") like you could use for a Pandas dataframe, you use px.line(x=df["a"], y=df["b"]). With polars, that is:

import polars as pl
import plotly.express as px

df = pl.DataFrame({"a":[1,2,3,4,5], "b":[1,4,9,16,25]})

px.line(x=df["a"], y=df["b"])

(Note that using plotly.express requires Pandas to be installed, see here and here. I used plotly.express in my answer because it was closer to the OP. The code could be adapted to using plotly.graph_objects if there was a desire to not have Pandas installed & involved at all.)

Wayne
  • 6,607
  • 8
  • 36
  • 93
  • 1
    This is exactly the elegant solution I was hoping for. Thanks! – fabioklr Apr 05 '22 at 09:02
  • ImportError: Plotly express requires pandas to be installed. – ScipioAfricanus May 28 '23 at 15:54
  • True, that seems to be the case @ScipioAfricanus . I reworded my first line, and I'll add a reference to that to the end. The main point stands that you don't need to convert to a Pandas dataframe. – Wayne May 29 '23 at 04:06
2

Currently making the switch to pola.rs from pandas. From my research your [] will work but is considered an anti-pattern in polars. This author suggests that you use the .to_series method.

px.pie(df,                                   # Polars DataFrame
   names = df.select('Model').to_series(),
   values = df.select('Sales').to_series(), 
   hover_name = df.select('Model').to_series(),
   color_discrete_sequence= px.colors.sequential.Plasma_r)

https://towardsdatascience.com/visualizing-polars-dataframes-using-plotly-express-8da4357d2ee0

When it comes to visualization of polar dataframe it seems you can't totally be rid of pandas dataframe conversion.

Hope this helped