0

I am using pandas 1.4.4 and Python 3.9.13 in Jupyter notebook and trying to make my print output appear not truncated.

My code is below:

for column in categorical_cols[1:]:
    unique_values = df[column].value_counts()
    non_null_values = df[column].count()
    print(f'Column: {column} - Non value values: {non_null_values}')
    print(unique_values)
    print()

Basically categorical_cols is the dataframe containing columns with non numerical values. There is a code for it below.

categorical_cols = df.select_dtypes(include=['object']).columns

And with for loop I am trying to find out unique values in each categorical column, which works fine. The only problem is that my output in Jupyter is truncated. Here is a print screen

Some info about df: Int64Index: 34434 entries, 0 to 34433 Data columns (total 51 columns)

I have tried various pandas.set_option below, but they do not seem to be working:

  • pd.set_option('display.max_rows', None)
  • pd.set_option('display.max_columns', None)
  • pd.set_option('display.pprint_nest_depth', 10)
  • pd.set_option('display.large_repr', 'info')
  • The reason all the "various pandas.set_option" you list that you tried aren't working is that Pandas dataframes aren't meant to be accessed the way you are doing this. You are using brute force Python to get at elements of the dataframe with the code you've shown. So no matter what you set those options to, they aren't going to affect your `print` statements. I suggest there's Pandas native ways to do this but you are posting about your idea of the solution and making this an [XY Problem](https://xyproblem.info/). For example, as God is One's answer hints out those options would affect .... – Wayne Jun 16 '23 at 04:14
  • the output of `display(df)` (if in Jupyter) or `df` or `print df` (if not in Jupyter, if they were set before running that. See [here](https://stackoverflow.com/a/30691921/8508004) or [here](https://stackoverflow.com/a/38489412/8508004). – Wayne Jun 16 '23 at 04:15
  • Wayne thank you for your comment. Please explain what do you mean by brute force Python and what am I doing wrong in this case. Also what are Pandas native ways to do this that you've mentioned? – Nick Kobets Jun 16 '23 at 18:05
  • Basically those listed options are for displaying a Pandas dataframe. (Maybe Pandas's groupby object qualifies too but I'd have to check.) Not derived things like counts. You wouldn't loop on `for column in categorical_cols[1:]:` and print things and expect the listed options to influence those. I don't know what `categorical_cols` comes from because you don't show us. Basically when dealing with dataframes you don't loop like you are doing. If you are, it's a hint something is off about your approach. Dataframes are like vectors and dataframes in R where you generally don't run `for` loops. – Wayne Jun 16 '23 at 18:44
  • But take those comments with a grain of satl. Sometimes using what you know gets you to the answer you need. That's fine but then you cannot expect the specific options for the Panda's approaches to then apply. I may though be off base a bit because you aren't sharing a lot of your code with us. Always best to include a toy example that parallels yours others can work with and run. And then we can edit it and return a working example you can use to then adapt your specific case. I think with the links I provided and God Is One's suggestions you probably can do what you need now? – Wayne Jun 16 '23 at 18:48
  • And even what I'm saying isn't 100%. I'll group through groupby groups because that is what the documentation says [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#iterating-through-groups). And I will iterate on `df.iterrows()` and `df.itertuples()`. Final take away is that I don't see how your single shared block of code relates to the list of 'pandas.set_option' you have provided towards the bottom of your post, although that looks to relate to title. So it makes it hard to tell what your question is. God is one & my links I should get at your title. – Wayne Jun 16 '23 at 18:57
  • @Wayne Thank you for detailed explanation. Now I understand that my approach with pd.set_option will not work with this loop. I have added some additional information and a print screen to my question, hope that will clarify the matter. – Nick Kobets Jun 16 '23 at 22:10
  • Ohhhh. In this case your description of truncated didn't get the point across. Plus, usually we expect you to be asking why Jupyter is only showing the beginning and end of the dataframe. Screenshot helps a lot in this rare case. Still I don't think your results are truncated either? You just need to use the scroll bar on the right and scroll to see all the results? Or just select the text and copy it elsewhere (propably to your favorite coding text editor like VScode or SublimeText) as another option? I'm going to propose a different option as answer ... – Wayne Jun 17 '23 at 14:20
  • because I That mode of view of the cell sometimes gets activated with very long output to make the rendering easier for Jupyter. I think you can usually toggle it off and back into another mode by clicking on the left side of the view window for the output of that cell to cycle through different views. – Wayne Jun 17 '23 at 14:21
  • @Wayne Thank you again. Toggling the view window on the left worked perfectly. Sorry about not being clear with my question, I am still relatively new to this. And thanks again for sticking with me on this. – Nick Kobets Jun 18 '23 at 16:05
  • Glad we cleared it up. Enjoy Jupyter. – Wayne Jun 18 '23 at 19:48

1 Answers1

0

The screenshot that has been added indicates that by 'truncated' you seem to mean that 'scrolled' view has been activated.

See this post and the answers to get an idea of what is going on here and how you can change it. See bottom of this answer for more about this mode of view that gets activated in classic notebook if you print a lot of output.

This happens when you print a lot of output to your cell. I speculate it may be an old, cautious way built in to Jupyter to better handle a lot of output when computers weren't generally as powerful. Note that this doesn't happen in modern JupyterLab, and so one solution is to start using the more modern offering of JupyterLab. (I don't know what Jupyter Notebook Version 7 or higher does. I'm assuming that nbclassic is what you are using. See here if 'nbclassic' and 'Jupyter Notebook Version 7' don't make sense to you.

Other solutions if you want to keep using nbclassic:

  • Toggle that mode off by clicking on the left hand side of the view cell.
  • Based on using IPython/Jupyter's %%capture and %store magic illustrated here you can add %%capture out to the top of your cell with a lot of output. Then in the next cell run %store out.stdout >my_ton_of_text.txt to save a file that has all the output then you can open that text file in Jupyter or your own text editor to peruse it.
  • Edit your cell's code to use Python to accumulate a string instead of printing it and then use %store magic covered above to save it as a file.

Details on scrolled view

If I ran the following code, I'll get the mode you show in your screenshot. If I run this code in a new notebook it will happen because there is a lot of output:

for x in range(2000):
    print(x)

So if I save the notebook file and then open it in a text editor then I see the following among the code for the .ipynb.

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e0c96475",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "1\n",
      "2\n",
      "3\n",

Note the "scrolled": true in the metadata for it.
You can cycle through the different view options by clicking on the left side of the cell output area. One of the views is it off.

Wayne
  • 6,607
  • 8
  • 36
  • 93