0

I would like to show two dataset random forest feature selection results in one bar, without overlapping. My solution overlaps with other data. Can I show them separately?

XData2 = Data2.drop(['Label'], axis=1)
yData2 = Data2['Label']
Xtrain, Xtest, ytrain, ytest = train_test_split(XData2, yData2, test_size = 0.33, random_state = 42)
RF_featuresData2 = RandomForestClassifier(n_estimators=100, random_state=0)
RF_featuresData2.fit(Xtrain, ytrain)

feature_scoresData2 = pd.Series(RF_featuresData2.feature_importances_, index=Xtrain.columns).sort_values(ascending=False)

sns.barplot(x=feature_scoresData2, y=feature_scoresData2.index)

enter image description here

Same steps I did for Data1 and also the result like this:

XData1 = Data1.drop(['Label'], axis=1)
    yData1 = Data1['Label']
    X_train, X_test, y_train, y_test = train_test_split(XData1, yData1, test_size = 0.33, random_state = 42)
    RF_featuresData1 = RandomForestClassifier(n_estimators=100, random_state=0)
    RF_featuresData1.fit(X_train, y_train)
    
    feature_scoresData1 = pd.Series(RF_featuresData1.feature_importances_, index=X_train.columns).sort_values(ascending=False)
    
    sns.barplot(x=feature_scoresData1, y=feature_scoresData1.index)

enter image description here

These are the plots I did. My goal is to combine them into one plot. Like in the seaborn documentation page, but for 2 datasets.

Example: enter image description here

  • do you want to create a plot like as your attached image? – I'mahdi Jun 21 '22 at 15:00
  • Sorry, there is a misunderstanding. That is the image that I did. I would like to show the datasets in one plot, but separately. Like in the seaborn documentation page, but for 2 datasets. – linuxpanther Jun 21 '22 at 15:03
  • add a column with the same name to each dataframe, where the value is a unique identifier like `'data1'` and `'data2'`, combine the two dataframes, and plot with seaborn and use `hue='...'` with the name of the added column. – Trenton McKinney Jun 21 '22 at 15:19

1 Answers1

2

How about that add column to each dataframe and insert name of dataframe then concat them and use sns.barplot like your question:

import seaborn as sns
import pandas as pd

ds1 = pd.DataFrame({'day':['Thur','Fri', 'Sat', 'Sun', 'Thur','Fri', 'Sat', 'Sun'],
                   'total_bill': [17,14,19,18, 22, 16, 25, 27]})

ds2 = pd.DataFrame({'day':['Thur','Fri', 'Sat', 'Sun', 'Thur','Fri', 'Sat', 'Sun'],
                   'total_bill': [18,22,25,27, 24,19,28,30]})

ds2['ds'] = 'dataset_2'
ds1['ds'] = 'dataset_1'
dss = pd.concat([ds1, ds2])
sns.barplot(x='day', y='total_bill', hue='ds', data=dss)

Output:

enter image description here

Edit:

ds1 = pd.DataFrame(feature_scoresData1)
ds2 = pd.DataFrame(feature_scoresData2)
ds2['ds'] = 'dataset_2'
ds1['ds'] = 'dataset_1'
dss = pd.concat([ds1, ds2])
sns.barplot(x=dss.index, y=dss[0], hue='ds', data=dss)
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/245796/discussion-between-linuxpanther-and-imahdi). – linuxpanther Jun 21 '22 at 16:11