How to remove carriage return in a dataframe

Question

I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r' attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r attached, it always creates a new row.

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

I want to remove \r. I tried df.replace({'\r': ''}, regex=True). It isn't working for me.

Is there any other solution. Can somebody help?

Edit:

In the above process, I am iterating over df to see if \r is present. If present, then need to replace. Here row.replace() or row.str.strip() doesn't seem to be working or I could be doing it in a wrong way.

I don't want specify the column name or row number while using replace(). Because I can't be certain that only 'location' column will be having \r. Please find the code below.

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace({r'\\r': ''} , regex=True)
        print row
        count += 1

And `df.replace({r'\\r': ''}, regex=True)` does not work too? Why use `iterrows()`? I think it is not neccesary, because iterating is very slow. — jezrael, May 11 '16 at 12:13
I have no other way to iterate over df. `df.replace({r'\\r': ''}, regex=True)` isn't working — Saranya, May 11 '16 at 12:46

jezrael · Accepted Answer · 2016-05-11T16:38:56.810

Another solution is use str.strip:

df['29'] = df['29'].str.strip(r'\\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

If you want use replace, add r and one \:

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

In replace you can define column for replacing like:

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace({'29': {r'\\r': ''}}, regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

EDIT by comment:

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

If need replace only in column location:

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

thanks! The above solutions doesn't seem to solve my problem Or I might be doing something wrong. In the above process, basically I'm iterating over a dataframe and, check if `\r` is present. If present, then I need to perform replace. I have edited my question again. — Saranya, May 11 '16 at 12:12
Ok, if use this `DataFrame` - `df = pd.DataFrame({'id': {0: 'location', 1: 'country_name', 2: 'total_deaths'}, '29': {0: 'Uttar Pradesh\\r', 1: 'India', 2: '20'}})`, it works? And what return `print df['29'].tolist()` ? — jezrael, May 11 '16 at 12:53
The above `DataFrame` returns me the following result: `['Uttar Pradesh\\r', 'India', '20']`. I have added my test file and data source to [link] (https://github.com/itsmesaranya/data-cleaning) . Can you please have a look? — Saranya, May 11 '16 at 16:26
Wow! :) It works. This is what I have been trying for hours. Thank you so much. Looks like I need to learn a lot! — Saranya, May 11 '16 at 16:45

EdChum · Answer 2 · 2016-05-11T11:19:39.187

3

use str.replace, you need to escape the sequence so it treats it as a carriage return rather than the literal \r:

In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df

Out[15]:
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

edited May 11 '16 at 11:19

answered May 11 '16 at 11:14

EdChum

376,765
198
813
562

Gwen Au · Answer 3 · 2019-10-28T00:00:18.287

3

The below code removes \n tab spaces, \n new line and \r carriage return and is great for condensing datum into one row. The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)

edited Oct 28 '19 at 00:00

answered Oct 27 '19 at 23:34

Gwen Au

859
9
10

score 0 · Answer 4 · edited Mar 17 '20 at 20:23

0

Just make df equal to the df.replace code line and then print df.

df=df.replace({'\r': ''}, regex=True) 
print(df)

edited Mar 17 '20 at 20:23

Nicolas Gervais

33,817
13
115
143

answered Mar 17 '20 at 19:45

user13078533

11

2

That answer already exists, word for word. I suggest you delete it to avoid cluttering the answer space with answers that already exist. Out of respect for future readers and the user who already posted your answer. – Nicolas Gervais Mar 17 '20 at 20:25

score 0 · Answer 5 · answered Mar 23 '21 at 13:45

0

Somehow, the accepted answer did not work for me. Ultimately, I found the solution by doing it like followed

df["29"] = df["29"].replace(r'\r', '', regex=True)

The difference is that I use \r instead of \\r.

answered Mar 23 '21 at 13:45

Yusril Maulidan Raji

1,682
1
21
46

How to remove carriage return in a dataframe

Edit:

5 Answers5

Linked