0

I want to output a DataFrame to a string while adding a delimiter for each column, but haven't found a way to do so (neither with to_string nor any other method). Doing it via regex, for example by substituting white spaces with the preferred delimiter, does not work since my data includes strings (sentences). Example:

import pandas as pd

data = pd.DataFrame({
    'a': [1, 'This is some text', 3, 4, 5],
    'b': [1, 2, 3, 'This is also some text', 5]
})

string = data.to_string(header = False)

string
'0 1 1\n1 This is some text 2 \n2 3 3\n3 4 This is also some text\n4 5 5'

Replacing white spaces with the preferred delimiter inserts the delimiter between every word in each sentence, which is not appropriate in this case. Is there a way to output a DataFrame to a string while specifying a delimiter for the variables?

My end goal here is to concatenate two DataFrames of different sizes (and with different variables), basically binding them row-wise, one on top of the other, and then outputting the combined results to a .csv file. The reason I am not doing this directly on the DataFrames (e.g. via pandas.concat) is because this forces the rows of the thus combined DataFrame to be equal (with NaN values for variables not included in one or the other DataFrame). This is obviously the preferred behaviour, but when printing to .csv, this produces blank spaces (e.g. ,,, where the NaN values in the DataFrame would be). I need to provide a .csv file that does not include any "blanks" and thus trying to achieve it through the above method.

Any suggestions on how to achieve this is highly welcome!

R.W.
  • 99
  • 5
  • as of version 1.5, pandas will have `Styler.to_string` which can do this. It exists on the master branch so you look at the code but wont have a public release for a few months, – Attack68 Jan 16 '22 at 21:50

2 Answers2

0

Sorry to write this as an answer ( I don't have enough reputation to comment ), but have you tried making both dataframes csv's and then concatting the files with cat, or directly writing both to the same file ( You may have to remove the header on one )

0

The easiest way is to use the package tabulate to add delimiters. As Pretty Printing a pandas dataframe explains,

import pandas as pd
from tabulate import tabulate

data = pd.DataFrame({
    'a': [1, 'This is some text', 3, 4, 5],
    'b': [1, 2, 3, 'This is also some text', 5]
})

print(tabulate(data, tablefmt='jira'))

produces

| 0 | 1                 | 1                      |
| 1 | This is some text | 2                      |
| 2 | 3                 | 3                      |
| 3 | 4                 | This is also some text |
| 4 | 5                 | 5                      |

There are many different formatting styles, see https://github.com/astanin/python-tabulate. Pick one that suits you or make your own.

Lucas
  • 362
  • 2
  • 12