1

I have a dataframe with ID and TEXTfield. I want to create another dataframe splitting the sentences in TEXT field by the dot and keeping the original ID

So the phrase: "I loves cats. I hate snakes" becomes two sentences in 2 rows in the new dataframe:

0 `I love cats`
0 `I hate snakes`

Original Dataframe:

ID                      TEXT
1    This is a msg. Another msg
2    The weather is hot, the water is cold. My hands are freezing 

Transformed Dataframe:

ID
1      This is a msg
1      Another msg
2      The weather is hot, the water is cold
2      My hands are freezing

the code to build the dataframe:

df = pd.DataFrame({'ID':[1,2], 'TEXT':['This is a msg. Another msg', 'The weather is hot, the water is cold. My hands are freezing']})

I am trying to use split -> df['TEXT'].astype(str).split('.') but I keep getting errors because series objects has no split method.

datashout
  • 147
  • 7

2 Answers2

1

You also need to set ID as index beforehand so that the exploded rows will have the respective IDs

df.set_index('ID', inplace=True)
split = df['TEXT'].str.split('.').explode()
Nuri Taş
  • 3,828
  • 2
  • 4
  • 22
0

Instead of df['TEXT'].astype(str).split('.')

try: df['TEXT'].str.split('.').explode()

Jeru Luke
  • 20,118
  • 13
  • 80
  • 87
gtomer
  • 5,643
  • 1
  • 10
  • 21
  • you suggestion works. Now i get a list [This is a msg, Another msg] and I need it in different rows, but keeping the ID. I am getting a hard time to do it ;-( – datashout Sep 04 '22 at 18:10
  • 1
    thanks @gtomer. I can't cast vote to your answer beacuse I am new the stackoverflow, but you helped a lot. Nuri tas answer solved my problem completely due the set_index operations that I was missing. Huge thanks tho. – datashout Sep 04 '22 at 18:33