0

I am using pandas library in python to generate a multi-indexed data, i.e., the columns are multi-indexed. The indices are category and source. I save this data as .csv file. In the file, the first row is the category values and second row is corresponding source values, then the data follows. I use this file to visualize in Orange3 software. But it takes only the first row as the column name, how do I make it take column name as the combination of the two.

I am just trying to visualize the whole thing as a histogram, if possible.

  1. Since, there are effectively 2 (category and source) + 1 (the row label) variables, 3d visualization would be best or
  2. 1 (category and source combined variable) + 1 (the row label), 2d visualisation

category 1 1 1 1 1 2 2 source a b c d e f g label l1 1 2 3 4 5 6 7 l2 4 5 6 7 8 9 10

TheRajVJain
  • 390
  • 5
  • 15

1 Answers1

1

According to documentation, Orange does not support reading multi-indexed data.

In order to visualize the data, you will need to convert it to a normal tabular format (one column per feature) before exporting the data to csv.

One way to do it is the DataFrame's unstack method:

df.unstack().to_csv("file.csv")

This will produce the file in the following format:

category    source    label
1           a         l1      1
1           a         l2      4
1           b         l1      2
...

This way, you can use category and source as separate variables in Orange.

.

To join category and source, you need to flatten the hierarchical index before exporting to csv:

df.columns = [' '.join(col).strip() for col in df.columns.values]
df.to_csv(file.csv)

This will produce the data in the following format:

label       1 a       1 b ...
l1          1         2
l2          4         5
Community
  • 1
  • 1
astaric
  • 185
  • 1
  • 3