2

How to save Pandas df.info() function output to variable or data frame? I tried using buffer value but it is not saving output neatly.

Code:

import io
buffer = io.StringIO()
df.info(buf=buffer)
s = buffer.getvalue()
with open("df_info.txt", "w",
          encoding="utf-8") as f:  
    f.write(s)

Result:

Sample output:

column        non-null count dtype 

We should get the output like in result in above 3 columns.

How can I do this?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • 2
    From the variable name `df` I am inferring you're asking about the `.info()` method of `pandas` dataframes? This function is specifically intended to print information to the console, but all the information it provides can be access directly on the dataframe with other methods and attributes - what exactly do you need from this block that makes you want to capture it? – Grismar Jan 17 '22 at 23:06
  • Also, the solution you found would be the best solution to do it (except that you want to get the value after writing) - you say it's not formatted neatly, but it's literally writing the output to the text file, so there's no neater way to get it. Do you realise that the console uses a fixed width font instead of a proportional font like Excel does? Have you tried looking at the output in Notepad or another text editor? What exactly is not neat about it in that setting? – Grismar Jan 17 '22 at 23:10
  • Does this answer your question? [how to save a pandas DataFrame to an excel file?](https://stackoverflow.com/questions/55170300/how-to-save-a-pandas-dataframe-to-an-excel-file) – mkrieger1 Jan 17 '22 at 23:11
  • @mkrieger1 that would save the dataframe's *contents* to an Excel file, OP appears to be asking how to get the dataframe's *metadata* into Excel (which will come down to fixed-width parsing of the text output, most likely - unless they just directly access the metadata and write it to a .csv) – Grismar Jan 17 '22 at 23:12
  • Oh, I mistook the "table" as the dataframe contents. – mkrieger1 Jan 17 '22 at 23:13
  • i have multiple data frames .I want to save the all the dataframes info() output one after the other. It is not possible to write the output to df with only columns, non-null,dtypes? – crazycoders Jan 17 '22 at 23:16
  • What would "neat" output look like? Can you make a sample dataframe and some sample output? – Henry Ecker Jan 18 '22 at 00:25
  • @HenryEcker, Neat output would be getting only column , non-null count, dtype only. I am attaching the sample output above – crazycoders Jan 18 '22 at 13:17
  • attached the sample output – crazycoders Jan 18 '22 at 13:22

1 Answers1

5

Use splitlines for lists, then indexig for remove first 5 values and last 2 and split by space with DataFrame constructor:

import io
buffer = io.StringIO()
df.info(buf=buffer)
lines = buffer.getvalue().splitlines()
df = (pd.DataFrame([x.split() for x in lines[5:-2]], columns=lines[3].split())
       .drop('Count',axis=1)
       .rename(columns={'Non-Null':'Non-Null Count'}))
print (df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252