3

I am trying out this text spinner but I find it troubling when I try to add a line break in the string that gets created. As you can see in the code below, I add "\n" but the output generated by the print(and also the content of the DataFrame) does not contain this break.

import spintax

df = pd.DataFrame()

for i in range(0, 50):
    data = spintax.spin("{option1|option2}" +  "\n" + " blablabla ")
    df = df.append({'A': data}, ignore_index=True)

df['A'] = df['A'].str.replace(r'\s+', " ")
print(df)

How could I make it work?

print(df) output looks like this:

                         A
0   option2 blablabla 
1   option2 blablabla 
2   option2 blablabla 
3   option2 blablabla 
4   option2 blablabla 
Marius Mucenicu
  • 1,685
  • 2
  • 16
  • 25
Questieme
  • 913
  • 2
  • 15
  • 34

2 Answers2

2

So the problem lies when you replace r\s+ which also matches line breaks and replaces them with white spaces. source.

If you comment your line then following will retain the newline character in strings.

  import spintax
    df = pd.DataFrame()
    for i in range(0, 50):
        data = spintax.spin("{option1|option2}" +  "\n" + " blablabla ")
        df = df.append({'A': data}, ignore_index=True)

    # df['A'] = df['A'].str.replace(r'\s+', " ")

    print(df)

Is that what you wanted to achieve?

Ahsun Ali
  • 314
  • 1
  • 6
  • Oh, it makes sense now. Yes, that is pretty much it. I will have to figure another way to remove the white spaces (they occur sometimes, didn't find out what I was doing wrong), then. – Questieme Sep 25 '19 at 13:30
  • This run from command line yields: `option1\n blablabla` was that expected answer? – Celius Stingher Sep 25 '19 at 13:34
1

It is impossible, because you would turn out with an extra row in you dataframe without an index value. The definition of a dataframe does not support what I believe you are trying to achieve Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

Which I believe looks like this:

                         A
0   option2
    blablabla
1   option2
    blablabla
2   option2 
    blablabla 
3   option2 
    blablabla 
4   option2 
    blablabla 

As a solution you can try and split into two columns and add an extra column that would flag where a line break should appear so when you concatenate the full row, you would get a string of what you want:

import spintax
import pandas as pd
df = pd.DataFrame()
for i in range(0, 50):
    data = spintax.spin("{option1|option2}" + "\n" +" blablabla ")
    df = df.append({'A': data}, ignore_index=True)
df['A'] = df['A'].str.replace(r'\s+', " ")
print(df)
df['split'] = df['A'].str.split(' ')
df['first'] = df['split'].str.get(0)
df['flag_break'] = '\n'
df['second'] = df['split'].str.get(1)
df['full_string'] = df['first'] + " " +df['flag_break']+df['second']
df = df.drop('split',axis=1)

print(df.head())
print(df['full_string'].max())

Output of your dataframe:

                     A    first flag_break     second          full_string
0   option2 blablabla   option2         \n  blablabla  option2 \nblablabla
1   option1 blablabla   option1         \n  blablabla  option1 \nblablabla
2   option2 blablabla   option2         \n  blablabla  option2 \nblablabla
3   option1 blablabla   option1         \n  blablabla  option1 \nblablabla
4   option2 blablabla   option2         \n  blablabla  option2 \nblablabla

Output of your full string, so that you get the line break print(df['full_string'].max()):

option2
blablabla
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53