1

How to add a full stop to a text please? I am not able to get the desired combined text.

# Import libraries
import pandas as pd
import numpy as np
 
# Initialize list of lists
data = [['text with a period.', '111A.'], 
        ['text without a period', '222B'], 
        ['text with many periods...', '333C'],
        [np.NaN, '333C'],
        [np.NaN, np.NaN]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['text1', 'text2'])

combined_df=df.copy()
combined_df["combined_text"]=df["text1"].fillna("") + ". " + df["text2"].fillna("") + '.'
combined_df

Desired output

combined_df snapshot

gracenz
  • 137
  • 1
  • 10
  • 1
    https://stackoverflow.com/questions/52722996/using-strip-to-remove-only-one-element. You can use `rstrip()` to remove characters only from the right side. – Ignatius Reilly Jun 20 '22 at 15:35

2 Answers2

1

You can use where and cat:

df['combined_text'] = df.text1.where(df.text1.str.endswith('.'),  df.text1 + '.').str.cat(
                        df.text2.where(df.text2.str.endswith('.'),  df.text2 + '.'),
                        sep=' ',
                        na_rep=''
                      ).str.strip().replace('', np.nan)

Result:

                       text1  text2                    combined_text
0        text with a period.  111A.        text with a period. 111A.
1      text without a period   222B     text without a period. 222B.
2  text with many periods...   333C  text with many periods... 333C.
3                        NaN   333C                            333C.
4                        NaN    NaN                              NaN

(this also works for the case when text1 is given and text2 is NaN)

Stef
  • 28,728
  • 2
  • 24
  • 52
  • 1
    @gracenz: to add a period only if the string doesn't end in any of the punctuation marks `.`, `?` or `!` you need to change `.str.endswith('.')` to `.str.contains('[.?!]$'` (`$` denotes the end of the string). – Stef Jun 20 '22 at 19:07
  • 1
    OP = Original Poster, see https://meta.stackoverflow.com/questions/253162/what-is-an-op-when-referring-to-stack-exchange – Stef Jun 20 '22 at 19:07
  • 1
    one more hint: in order to comment on some answer, please use the comment button *under the answer your comment pertains to*, otherwise things can get a bit confused :) – Stef Jun 20 '22 at 19:09
1

Hope this helps:

data = [['this is the first text with a period.', '111A.'], 
        ['this is the second text without a period', '222B'], 
        ['this is the third text with many periods...', '333C'],
        [np.NaN, '333C'],
        [np.NaN, np.NaN]]

Create the pandas DataFrame

df = pd.DataFrame(data, columns=['text1', 'text2'])

combined_df=df.copy()
combined_df["combined_text"]=df.text1.str.split('.').str[0]+'. '+df.text2.str.split('.').str[0]

print(combined_df)
                                         text1  text2                                   combined_text
0        this is the first text with a period.  111A.      this is the first text with a period. 111A
1     this is the second text without a period   222B  this is the second text without a period. 222B
2  this is the third text with many periods...   333C  this is the third text with many periods. 333C
3                                          NaN   333C                                             NaN
4                                          NaN    NaN                                             NaN
ragas
  • 848
  • 2
  • 7
  • did you compare your result with the `desired compared text` column in the OP? – Stef Jun 20 '22 at 16:28
  • Thanks @Stef for your solution. Can I ask how do we extend it to further considering other scenarios where full stop is not to be added when the text already ends with some punctuations e.g. ! or ? for example? As far as I can understand, .str.endwith takes only one value and for now it is a full stop. – gracenz Jun 20 '22 at 18:32
  • Sorry but what is OP? – gracenz Jun 20 '22 at 18:33
  • Thanks @ragas for your solution. But I need to not change the punctuation to only one full stop when either text1 or text2 already ends with a punctuation. – gracenz Jun 20 '22 at 18:36