Programmatically convert pandas dataframe to markdown table

Question

I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+

As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
    items = (row['date'], 
             row['path'], 
             row['language'], 
             row['shortest_sentence'],
             row['longest_sentence'], 
             row['number_words'], 
             row['readability_consensus'])
    rows.append(items)

headings = ['Date', 
            'Path', 
            'Language',
            'Shortest Sentence', 
            'Longest Sentence since', 
            'Words',
            'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
         ('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)

However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128) error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |

Try this: `new_string = string.decode('latin1')`, where string = "Høgskolen i Østfold er et eksempel..." In other words use `decode('latin1')` on your string to decode Norwegian characters. — Rohit, Oct 17 '15 at 06:16
I set up a switch structure for decoding when ``language == 'Nor'``: enc = 'latin1' items = (row['date'].decode(enc), row['path'].decode(enc), row['language'].decode(enc), row['shortest_sentence'].decode(enc), row['longest_sentence'].decode(enc) ... But still get: ``UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)``, specifically on ``row['longest_sentence'].decode(enc)``. — OleVik, Oct 17 '15 at 12:08

timvink · Answer 1 · 2020-08-14T11:42:07.143

44

Pandas 1.0 was released 29 january 2020 and supports markdown conversion, so you can now do this directly!

Example taken from the docs:

df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())

|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |

Or without the index:

print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1

|   A |   B |
|----:|----:|
|   1 |   1 |
|   2 |   2 |
|   3 |   3 |

edited Aug 14 '20 at 11:42

answered Jan 31 '20 at 08:58

timvink

636
6
7

Any way to remove the index when converting to markdown? Passing `index=False` didn't work. – stefanbschneider Mar 13 '20 at 15:29
6

Yes: `df.to_markdown(showindex=False)`. Pandas uses [tabulate](https://github.com/astanin/python-tabulate), so you can pass tabulate parameters via to_markdown(). – timvink Mar 15 '20 at 13:32
FutureWarning: `showindex` is deprecated. Only 'index' will be used in a future version. – Carson Aug 13 '20 at 02:53
Thanks, indeed futurewarning shows up with `pandas>=1.1`. Updated answer. – timvink Aug 14 '20 at 11:40

score 40 · Answer 2 · edited Mar 24 '22 at 10:26

40

Improving the answer further, for use in IPython Notebook:

def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(infodf)

Or use tabulate:

pip install tabulate

Examples of use are in the documentation.

Update

As of pandas 1.0 DataFrame to markdown is available. Please see answer from @timvink (docs)

edited Mar 24 '22 at 10:26

ap14

4,393
1
15
30

answered Nov 23 '15 at 10:46

kpykc

663
6
8

6

To use tabulate, I used `print(tabulate.tabulate(df.values,df.columns, tablefmt="pipe"))` – Josiah Yoder Aug 28 '18 at 17:03
I changed last line of function from `to_csv` to `to_clipboard` so that I could paste the results easily. – scottlittle Jun 17 '19 at 19:38
2

`df.to_markdown()` is available now on Pandas which is using `tabulate`. – Chingiz K. Feb 15 '22 at 21:15

score 39 · Answer 3 · edited Mar 29 '19 at 09:05

I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame as well.

Here is how to use it:

from pandas import DataFrame
from tabulate import tabulate

df = DataFrame({
    "weekday": ["monday", "thursday", "wednesday"],
    "temperature": [20, 30, 25],
    "precipitation": [100, 200, 150],
}).set_index("weekday")

print(tabulate(df, tablefmt="pipe", headers="keys"))

Output:

| weekday   |   temperature |   precipitation |
|:----------|--------------:|----------------:|
| monday    |            20 |             100 |
| thursday  |            30 |             200 |
| wednesday |            25 |             150 |

You can do this with `tablefmt="pipe"`. That PR was declined and there is no `tablefmt="markdown"`. — Tomasz Sętkowski, Dec 09 '17 at 13:26

score 11 · Answer 4 · edited May 25 '16 at 13:12

11

Try this out. I got it to work.

See the screenshot of my markdown file converted to HTML at the end of this answer.

import pandas as pd

# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)

# Get column names
cols = df.columns

# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)

#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])

#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)

edited May 25 '16 at 13:12

Simon Kamronn

198
6

answered Oct 17 '15 at 14:28

Rohit

5,840
13
42
65

Relative to my dataframe there were a couple extra separators, but running this: ``cols = df.columns df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols) df3 = pd.concat([df2, df]) df3.to_csv("nor.md", sep="|", index=False)`` Still gives a encoding error: ``UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)``. Also, out of curiosity, would it be problematic that ``shortest_sentence`` and ``longest_sentence`` may contain line breaks? – OleVik Oct 17 '15 at 14:58
I do not think the line breaks should matter as long as they show up fine in pandas DataFrame. Pandas should be able to take in unicode characters just fine so I am not sure why this does not work. Check out this question: (http://stackoverflow.com/questions/7315629/python-encoding-string-swedish-letters). I do not know how close Swedish is to Norwegian but you may find something helpful. – Rohit Oct 17 '15 at 15:30
If you find an answer to your problem, do answer your own question for posterity. – Rohit Oct 17 '15 at 15:35
Through some further testing I was able to, and using the elegant solution you proposed. Thanks for your help! – OleVik Oct 17 '15 at 17:54

score 7 · Answer 5 · answered Sep 05 '16 at 16:45

Export a DataFrame to markdown

I created the following function for exporting a pandas.DataFrame to markdown in Python:

def df_to_markdown(df, float_format='%.2g'):
    """
    Export a pandas.DataFrame to markdown-formatted text.
    DataFrame should not contain any `|` characters.
    """
    from os import linesep
    return linesep.join([
        '|'.join(df.columns),
        '|'.join(4 * '-' for i in df.columns),
        df.to_csv(sep='|', index=False, header=False, float_format=float_format)
    ]).replace('|', ' | ')

This function may not automatically fix the encoding issues of the OP, but that is a different issue than converting from pandas to markdown.

I needed to modify this function to add `|` to the beginning and end of each line in the DataFrame to get proper markdown output. — abeboparebop, Dec 06 '19 at 16:47
Useful when usecase is to copy-paste a dataframe in markdown format. — yardstick17, Mar 02 '21 at 10:33

score 6 · Answer 6 · answered Nov 22 '16 at 00:30

I have tried several of the above solutions in this post and found this worked most consistently.

To convert a pandas data frame to a markdown table I suggest using pytablewriter. Using the data provided in this post:

import pandas as pd
import pytablewriter
from StringIO import StringIO

c = StringIO("""ID, path,language, date,longest_sentence, shortest_sentence, number_words , readability_consensus 
0, data/Eng/Sagitarius.txt , Eng, 2015-09-17 , With administrative experience in the prepa... , I am able to relocate internationally on short not..., 306, 11th and 12th grade
31 , data/Nor/Høylandet.txt  , Nor, 2015-07-22 , Høgskolen i Østfold er et eksempel..., Som skuespiller har jeg både..., 253, 15th and 16th grade
""")
df = pd.read_csv(c,sep=',',index_col=['ID'])

writer = pytablewriter.MarkdownTableWriter()
writer.table_name = "example_table"
writer.header_list = list(df.columns.values)
writer.value_matrix = df.values.tolist()
writer.write_table()

This results in:

# example_table
ID |           path           |language|    date    |                longest_sentence                |                   shortest_sentence                  | number_words | readability_consensus 
--:|--------------------------|--------|------------|------------------------------------------------|------------------------------------------------------|-------------:|-----------------------
  0| data/Eng/Sagitarius.txt  | Eng    | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...|           306| 11th and 12th grade   
 31| data/Nor/Høylandet.txt  | Nor    | 2015-07-22 | Høgskolen i Østfold er et eksempel...        | Som skuespiller har jeg både...                      |           253| 15th and 16th grade

Here is a markdown rendered screenshot.

Excellent, elegantly render in both editor and preview ! – Mithril Jun 16 '17 at 07:13 — Mithril, Jun 16 '17 at 07:13

score 2 · Answer 7 · answered May 03 '18 at 10:21

2

For those looking for how to do this using tabulate, I thought I'd put this here to save you some time:

print(tabulate(df, tablefmt="pipe", headers="keys", showindex=False))

answered May 03 '18 at 10:21

Anake

7,201
12
45
59

Peter Zagubisalo · Answer 8 · 2019-01-13T17:20:34.133

Yet another solution. This time via thin wrapper around tabulate: tabulatehelper

import numpy as np
import pandas as pd
import tabulatehelper as th

df = pd.DataFrame(np.random.random(16).reshape(4, 4), columns=('a', 'b', 'c', 'd'))
print(th.md_table(df, formats={-1: 'c'}))

Output:

|        a |        b |        c |        d |
|---------:|---------:|---------:|:--------:|
| 0.413284 | 0.932373 | 0.277797 | 0.646333 |
| 0.552731 | 0.381826 | 0.141727 | 0.2483   |
| 0.779889 | 0.012458 | 0.308352 | 0.650859 |
| 0.301109 | 0.982111 | 0.994024 | 0.43551  |

score 2 · Answer 9 · answered Jan 22 '20 at 05:45

2

Pandas have merged a PR to support df.to_markdown() method. You can find more details here It should be available soon.

answered Jan 22 '20 at 05:45

sam

2,263
22
34

score 1 · Accepted Answer · edited May 23 '17 at 11:53

Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:

# Enforce UTF-8 encoding
import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('UTF-8')

# SQLite3 database
import sqlite3
# Pandas: Data structures and data analysis tools
import pandas as pd

# Read database, attach as Pandas dataframe
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']

# Parse Dataframe and apply Markdown, then save as 'table.md'
cols = df.columns
df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
df3 = pd.concat([df2, df])
df3.to_csv("table.md", sep="|", index=False)

An important precursor to this is that the shortest_sentence and longest_sentence columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '') to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1 for Norwegian), but rather that UTF-8 is used instead of the default ASCII.

I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):

| Path                    | Language | Date       | Shortest Sentence                                                                            | Longest Sentence                                                                                                                                                                                                                                         | Words | Readability Consensus |
|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
| data/Eng/Something1.txt | Eng      | 2015-09-17 | I am able to relocate to London on short notice.                                             | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant.       | 306   | 11th and 12th grade   |
| data/Nor/NoeNorrønt.txt | Nor      | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe.                                     | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205   | 18th and 19th grade   |
| data/Nor/Ørret.txt.txt  | Nor      | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160   | 18th and 19th grade   |

Thus, a Markdown table without problems with encoding.

Calling `sys.setdefaultencoding('UTF-8')` is an ultra bad idea. It masks all kind of issues and it looks like your new code doesn't need it as you're not calling `table()`, which is full of implied encodings. — Alastair McCormack, Dec 26 '15 at 12:18

score 1 · Answer 11 · answered Dec 16 '16 at 09:05

Here's an example function using pytablewriter and some regular expressions to make the markdown table more similar to how a dataframe looks on Jupyter (with the row headers bold).

import io
import re
import pandas as pd
import pytablewriter

def df_to_markdown(df):
    """
    Converts Pandas DataFrame to markdown table,
    making the index bold (as in Jupyter) unless it's a
    pd.RangeIndex, in which case the index is completely dropped.
    Returns a string containing markdown table.
    """
    isRangeIndex = isinstance(df.index, pd.RangeIndex)
    if not isRangeIndex:
        df = df.reset_index()
    writer = pytablewriter.MarkdownTableWriter()
    writer.stream = io.StringIO()
    writer.header_list = df.columns
    writer.value_matrix = df.values
    writer.write_table()
    writer.stream.seek(0)
    table = writer.stream.readlines()

    if isRangeIndex:
        return ''.join(table)
    else:
        # Make the indexes bold
        new_table = table[:2]
        for line in table[2:]:
            new_table.append(re.sub('^(.*?)\|', r'**\1**|', line))    

        return ''.join(new_table)

score 1 · Answer 12 · answered Oct 31 '17 at 14:54

1

Using external tool pandoc and pipe:

def to_markdown(df):
    from subprocess import Popen, PIPE
    s = df.to_latex()
    p = Popen('pandoc -f latex -t markdown',
              stdin=PIPE, stdout=PIPE, shell=True)
    stdoutdata, _ = p.communicate(input=s.encode("utf-8"))
    return stdoutdata.decode("utf-8")

answered Oct 31 '17 at 14:54

Ilya Prokin

684
6
11

Opening another process seems vulnerable to unforeseen problems but for the simple case this is great. – Nolan Conaway Dec 27 '17 at 16:40

score 0 · Answer 13 · answered Dec 26 '15 at 12:16

sqlite3 returns Unicodes by default for TEXT fields. Everything was set up to work before you introduced the table() function from an external source (that you did not provide in your question).

The table() function has str() calls which do not provide an encoding, so ASCII is used to protect you.

You need to re-write table() not to do this, especially as you've got Unicode objects. You may have some success by simply replacing str() with unicode()

Programmatically convert pandas dataframe to markdown table

13 Answers13

Export a DataFrame to markdown

Linked