26

I have a similar problem to the one posted here:

Pandas DataFrame: remove unwanted parts from strings in a column

I need to remove newline characters from within a string in a DataFrame. Basically, I've accessed an api using python's json module and that's all ok. Creating the DataFrame works amazingly, too. However, when I want to finally output the end result into a csv, I get a bit stuck, because there are newlines that are creating false 'new rows' in the csv file.

So basically I'm trying to turn this:

'...this is a paragraph.

And this is another paragraph...'

into this:

'...this is a paragraph. And this is another paragraph...'

I don't care about preserving any kind of '\n' or any special symbols for the paragraph break. So it can be stripped right out.

I've tried a few variations:

misc['product_desc'] = misc['product_desc'].strip('\n')

AttributeError: 'Series' object has no attribute 'strip'

here's another

misc['product_desc'] = misc['product_desc'].str.strip('\n')

TypeError: wrapper() takes exactly 1 argument (2 given)

misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n'))
misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n\t'))

There is no error message, but the newline characters don't go away, either. Same thing with this:

misc = misc.replace('\n', '')

The write to csv line is this:

misc_id.to_csv('C:\Users\jlalonde\Desktop\misc_w_id.csv', sep=' ', na_rep='', index=False, encoding='utf-8')

Version of Pandas is 0.9.1

Thanks! :)

Community
  • 1
  • 1
joseph_pindi
  • 857
  • 2
  • 10
  • 22

2 Answers2

48

strip only removes the specified characters at the beginning and end of the string. If you want to remove all \n, you need to use replace.

misc['product_desc'] = misc['product_desc'].str.replace('\n', '')
BrenBarn
  • 242,874
  • 37
  • 412
  • 384
7

You could use regex parameter of replace method to achieve that:

misc['product_desc'] = misc['product_desc'].replace(to_replace='\n', value='', regex=True)
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
  • 2
    If `product_desc` may contain mixed values (e.g. float, str...) then convert it to `np.str` to work properly: `misc['product_desc'] = misc['product_desc'].astype(np.str).replace(to_replace='\n', value='', regex=True)`. Otherwise only `str` values will be replaced... – ragesz May 17 '16 at 16:36
  • 2
    to_replace can use a list, too: `.replace(to_replace=['\n', '\t'], value='', regex=True)` – BjoernL. Mar 04 '17 at 11:38
  • How do you replace items that are within a word? Example: 'This is a sente\tnce'. (Remove \t) – Arthur D. Howland Sep 20 '18 at 15:25
  • @ArthurD.Howland code from the answer should work for that cases. – Anton Protopopov Sep 21 '18 at 08:29