Python- Cleaning the data from .txt file?

Question

import pandas as pd
import os
import numpy as np
import re


#LOAD THE DATA
df = pd.read_fwf('receipt.txt')

data= df.replace("£", "")

print(data)

I have attempted to clean this data and remove "£,:,-" could I please have help on how to best remove these symbols from my data? Please see image attached.

import pandas as pd
import os
import numpy as np


#LOAD THE DATA
df = pd.read_fwf('receipt.txt')
df.head()

Screenshot of txt file

It would be much better if you posted the actual file, not a screenshot. Remember, if you make it more difficult for people to help you, you are less likely to get help. — jpnadas, Jun 10 '20 at 13:50
Does this answer your question? [How to replace a characters in a column of a Pandas dataframe?](https://stackoverflow.com/questions/28986489/how-to-replace-a-characters-in-a-column-of-a-pandas-dataframe) — jpnadas, Jun 10 '20 at 13:51
Thank you for tip! I tried to attach .txt file though seems to be no place for file uploads, i referred to link you sent and received error- 'DataFrame' object has no attribute 'str' — Rimi, Jun 10 '20 at 13:56

score 1 · Answer 1 · answered Jun 10 '20 at 13:57

1

You can use string replace and just substitute the undesired strings with empty string "", essentially deleting them.

Example:

str.replace("unwanted", "")

If you don't have to do this in every run of your code, consider data-cleaning outside of your script, with a simple shell " tr -d 'idontwantthis' " (assuming Linux/OSX)

answered Jun 10 '20 at 13:57

Michal Fašánek

513
5
17

1

Thank you, i got the error - replace expected at least 2 arguments, got 1 – Rimi Jun 10 '20 at 14:06
1

I should have mentioned that "str" is your string variable. If you encounter any more errors, please paste your code – Michal Fašánek Jun 10 '20 at 14:35
1

This is data in the txt file : £ 2800.02020-06-08 19:48:28.975953£ 500.02020-06-08 19:48:47.833899£ 800.02020-06-08 19:49:45.017243 – Rimi Jun 10 '20 at 14:39
1

i still get - 'empty data frame' – Rimi Jun 10 '20 at 14:40
@Rimi Are you doing this operation do dataframe? Because you have to do it to a string variable. Load the file as a text and apply the fix, save it somewhere, THEN load new file as Dataframe again – Michal Fašánek Jun 11 '20 at 12:12

score 1 · Answer 2 · answered Jun 10 '20 at 14:24

1

You could just do:

readfilestr.replace("[the text to remove goes here]", "")

answered Jun 10 '20 at 14:24

Manu1800

145
7

1

thank you I am getting the following- Empty DataFrame Columns: [£, 2800.02020-06-08, 19:48:28.975953£, 500.02020-06-08, 19:48:47.833899£, 800.02020-06-08, 19:49:45.017243] Index: [] – Rimi Jun 10 '20 at 14:28
If you want to remove those symbols from every single item in the list, then you can do a for loop: for i in len(mylist): if mylist[i].__contains__("[unwanted]"): mylist[i] = str(mylist[i]).replace(['unwanted']) (also you can add a for loop if your is multi dimensional) – Manu1800 Jun 11 '20 at 16:19

score 0 · Answer 3 · answered Jun 10 '20 at 13:56

0

You can take a look at the Regular Expressions (RegEx) module re.

import re

string = "test with £,:,-"

new_string= re.sub('[£:-]', "", string)

print(new_string) # test with ,,

There are some good examples here,

answered Jun 10 '20 at 13:56

MBoaretto

21
5

you can read each row and clean the fields... i just showed an example of regex. – MBoaretto Jun 10 '20 at 19:42

Python- Cleaning the data from .txt file?

3 Answers3