0

[enter image description here][1]I get the following error:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['Japan', 'Italy', 'Spain', 'Norway', 'Mexico'], dtype='object', name='Country'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike

I have searched for days and am at a complete loss to what it means and how to mitigate it. I can post my code, if that would help in deciphering it?

Any help or insight would be greatly appreciated (I'm at beginners level, writing Python, so please bear with me if the solution has been there all along).

# We have 11 clusters which we can put on a 3x4 grid of plots, and disable the last  plot
fig, ax = plt.subplots(int(len(clusters)/4)+1,4,figsize=(13,10))

ax = ax.ravel() # This makes iterating over axes simpler 

for i, (cluster, axes) in enumerate(zip(clusters, ax)): # One plot per centroid
    # Pull out cluster genre data
    indices = subset[db_clust == cluster].index
    cluster_data = data.loc[indices]
    
    # Pull out counts per country, greatest first, also get number of active bands per country
    count = cluster_data.groupby('Country')['GenreTerms'].count().sort_values(ascending=False)
    count_active = cluster_data[cluster_data['Status'] == 'Active'].groupby('Country')['GenreTerms'].count()
    # Pull out top 3 most common terms
    term_count = count_terms(cluster_data['GenreTerms'])
    top_terms = term_count['Term'][:3]

    # Define color for cluster
    color = colors[cluster]
    # Define y-axis coordinates
    coords = np.arange(10, 0, -1)
    # kwargs for barh
    bar_kw = {'height': 0.5, 'color': color, 'align': 'center'}
    # Plot bars representing only active bands (note using index of original data)
    axes.barh(bottom=coords, width=count_active[count[:10].index], **bar_kw, alpha=0.75, lw=0, label='Active')
    # overlay horizontal bars representing all bands
    axes.barh(bottom=coords, width=count[:10], **bar_kw, alpha=0.25, label='Non-Active')
    if i == 0:
        axes.legend(frameon=False, loc=0)
    # Set title using 3 most common terms from cluster
    axes.set_title('C({}): {}, {}, {}'.format(cluster, *top_terms))
    # Format country labels, shorten longer names for more compact layout
    axes.set_yticks(coords)
    ticklabels = count[:10].index
    ticklabels = ticklabels.str.replace('United States', 'US').str.replace('United Kingdom', 'UK')
    axes.set_yticklabels(ticklabels, ha='right', va='center')

    # Format plot
    axes.set_ylim(0.25, 10.75)
    axes.grid('off', axis='y')

# Hide last plot
ax[-1].set_axis_off()

fig.tight_layout()
fig.suptitle('# of Bands per top 10 countries in each cluster', y=1.025, fontsize=16, weight='bold')
  • 5
    Yes, please do post your code, as a [mre]. – tripleee Apr 02 '21 at 18:54
  • Hi Tripleee, thank you for helping - where would you like my code ? This text field is limited in characters, so not all will fit in...sorry for my ignorance, I am new to Stackoverflow :-) – Kim Newton Kassebeer Apr 02 '21 at 19:02
  • See https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples. It gives you an idea how to create a small example that illustrates your problem without needing a huge DataFrame – ALollz Apr 02 '21 at 19:16
  • Okay. But I have no idea where in the code I should copy/paste - that is, I don't know what particular part of it would be of significance to solving the problem? Should I not just past it as it is? – Kim Newton Kassebeer Apr 02 '21 at 19:24
  • for i, (cluster, axes) in enumerate(zip(clusters, ax)): # One plot per centroid # Pull out cluster genre data indices = subset[db_clust == cluster].index cluster_data = data.loc[indices] – Kim Newton Kassebeer Apr 02 '21 at 19:25
  • Maybe this is where the error originates, perhaps? – Kim Newton Kassebeer Apr 02 '21 at 19:26
  • Okay, I have absolutely no idea what you all mean by minimal reproducible example...just forget it. I'll check GitHub instead. Thank you for your time :-) – Kim Newton Kassebeer Apr 02 '21 at 19:58
  • Kim, you can edit your original post to include code. – Sam Szotkowski Apr 02 '21 at 20:14
  • Hi Sam, if i try to post the code, Stackoverflow page throws the following error: "Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon." I'm new to all this. I just want to show my code... – Kim Newton Kassebeer Apr 02 '21 at 20:30
  • When you're pasting the code, just put 3 backticks ``` above and below it, or one ` before and after if it's inline. Or highlight your code and do ctrl+k like it says. Do whatever you did to write the KeyError – Sam Szotkowski Apr 02 '21 at 20:33
  • Thank you, Sam - I managed to add the code by using ```. – Kim Newton Kassebeer Apr 02 '21 at 20:37
  • See my updated example. Like I say it would be helpful to know what exactly you're trying to get out of which dataframe, and eschewing all the irrelevant code. – Sam Szotkowski Apr 02 '21 at 21:05
  • Please [don’t post images of code or error messages.](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) – tripleee Apr 03 '21 at 06:36

2 Answers2

0

I don't know what columns are in your db_clust or subset or data dataframes, and I'm curious if you really need to be 3+ separate dataframes. But let's suppose subset is a dataframe with a column called 'Cluster Names', and you want to create a dataframe containing only rows with a Cluster Name matching your cluster variable. You could achieve this with:

for i, (cluster, axes) in enumerate(zip(clusters, ax)): # One plot per centroid
    # Pull out cluster genre data
    cluster_data = subset[subset['Cluster Names'] == cluster]

See: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

P.S. What the commenters mean by Minimum Reproducible Example is that everything regarding the plots and calculations are completely irrelevant to your question, and you're not including important context.

Sam Szotkowski
  • 344
  • 1
  • 6
  • Hi Sam - thank you so much for your suggestion. Unfortunately it did not seem to do the trick. I'm a trying to get this wonderful notebook up and running. https://jonchar.net/notebooks/MA-Exploratory-Analysis/ have the freshly scraped data in place (.csv file), and everything seems to be working fine until the error occurs. i thought this little GitHub project would be a great learning experience for me, but it has given me a headache :-/ – Kim Newton Kassebeer Apr 02 '21 at 21:07
  • I see. Try keeping the code exactly how it is in the notebook, but change .loc to .iloc (locate by index) – Sam Szotkowski Apr 02 '21 at 21:22
0

You have columns in your list indices which are not in your dataframe.

See this Passing list-likes to .loc or [] with any missing labels is no longer supported

RajeshM
  • 872
  • 11
  • 21