1

I have a csv file where some rows have words, and some have numbers. Something like :

         column1   column2  column3 
date      2019      2020      2021
color     blue      blue     yellow
velocity    1        22        3 
power       4        2         1 

And I need to visualize it in a plot that allows me to visually search for patterns between all the attributes (color, velocity, power) trough time. But all plotly plots I've found only allow me to track quantitative or qualitative values alone, not together. I cant even visualize in my head how a plot would have to be to allow this ... The only way I can think of, is to transpose each qualitative value to a new row, and add an arbitrary and constant number to it, like :

       column1   column2  column3 
date      2019      2020      2021
blue       100       100       0  
yellow      0        0        100
velocity    1        22        3 
power       4        2         1 

So in a line plot, for example, there would be a straight line at the top indicating which qualitative value is happening, while all others would be at the bottom. And I guess it could be colored according to the row index(date, color, velocity, power), allowing me to identify it visually. But I'm quite sure there is a better way.

Any plot library is acceptable, although plotly is preferential, because its easy.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38

2 Answers2

0
  • Your data as presented is four factors. So you can use a scatter for x,y,size & color
  • color is an obvious choice for qualitative. Others can also be qualitative by using categorical data types as well.
import io
import pandas as pd
import plotly.express as px

df = pd.read_csv(io.StringIO("""         column1   column2  column3 
date      2019      2020      2021
color     blue      blue     yellow
velocity    1        22        3 
power       4        2         1  """), sep="\s+").T

df["date"] = df["date"].astype(int)
df["velocity"] = df["velocity"].astype(int)
df["power"] = df["power"].astype(int)

fig = px.scatter(df, x="date", y="velocity", color="color", size="power")
fig.update_layout(xaxis={"tickformat":"d"})

enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • And what if there is 10 qualitative values, instead of just one ? Maybe the simple way would be to create two diferent plots ? one for the qualitative data and one for the quantitative ? –  Jun 26 '21 at 14:37
  • 1
    more values simple - it will just work. if you mean more factors / dimensions - for example with financial data, **asset type**, **trade type**, **market**, **exchange**, **underlying** along with **price**, **quantity** clear you need to limit to number of factors that can be represented in a chart and provide options for choosing other dimensions – Rob Raymond Jun 26 '21 at 17:40
0

A standard px.scatter(df, x="date", y="velocity", color="color", size="power") seems to be working just fine. But if your setup is not only a representation of a more complicated real world case, I would suggest a little tweak so that the name in the legend does not say 'yellow' but display 'red', which is exactly what is going to happen because color in px.scatter(color ='color') does not actually assign a color, but rather a categorical variable to which a color sequence will be applied. And I suspect that simultanously showing and naming the color in the legend is a bit superflous. Anyway, I hope you'll find this useful:

fig.for_each_trace(lambda t: t.update(marker_color = t.name, name=''))

enter image description here

Complete code:

import pandas as pd
import plotly.express as px

df = pd.DataFrame({'date': {'column1': 2019, 'column2': 2020, 'column3': 2021},
                     'color': {'column1': 'blue', 'column2': 'blue', 'column3': 'yellow'},
                     'velocity': {'column1': 1, 'column2': 22, 'column3': 3},
                     'power': {'column1': 4, 'column2': 2, 'column3': 1}})

fig = px.scatter(df, x="date", y="velocity", color="color", size="power")
f = fig.full_figure_for_development(warn=False)

fig.for_each_trace(lambda t: t.update(marker_color = t.name, name=''))

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305