1

import pandas as pd
import os
import numpy as np
import re


#LOAD THE DATA
df = pd.read_fwf('receipt.txt')

data= df.replace("£", "")

print(data)

I have attempted to clean this data and remove "£,:,-" could I please have help on how to best remove these symbols from my data? Please see image attached.

import pandas as pd
import os
import numpy as np


#LOAD THE DATA
df = pd.read_fwf('receipt.txt')
df.head()

Screenshot of txt file

Rimi
  • 11
  • 4
  • 1
    It would be much better if you posted the actual file, not a screenshot. Remember, if you make it more difficult for people to help you, you are less likely to get help. – jpnadas Jun 10 '20 at 13:50
  • 1
    Does this answer your question? [How to replace a characters in a column of a Pandas dataframe?](https://stackoverflow.com/questions/28986489/how-to-replace-a-characters-in-a-column-of-a-pandas-dataframe) – jpnadas Jun 10 '20 at 13:51
  • Thank you for tip! I tried to attach .txt file though seems to be no place for file uploads, i referred to link you sent and received error- 'DataFrame' object has no attribute 'str' – Rimi Jun 10 '20 at 13:56

3 Answers3

1

You can use string replace and just substitute the undesired strings with empty string "", essentially deleting them.

Example:

str.replace("unwanted", "")

If you don't have to do this in every run of your code, consider data-cleaning outside of your script, with a simple shell " tr -d 'idontwantthis' " (assuming Linux/OSX)

Michal Fašánek
  • 513
  • 5
  • 17
  • 1
    Thank you, i got the error - replace expected at least 2 arguments, got 1 – Rimi Jun 10 '20 at 14:06
  • 1
    I should have mentioned that "str" is your string variable. If you encounter any more errors, please paste your code – Michal Fašánek Jun 10 '20 at 14:35
  • 1
    This is data in the txt file : £ 2800.02020-06-08 19:48:28.975953£ 500.02020-06-08 19:48:47.833899£ 800.02020-06-08 19:49:45.017243 – Rimi Jun 10 '20 at 14:39
  • 1
    i still get - 'empty data frame' – Rimi Jun 10 '20 at 14:40
  • @Rimi Are you doing this operation do dataframe? Because you have to do it to a string variable. Load the file as a text and apply the fix, save it somewhere, THEN load new file as Dataframe again – Michal Fašánek Jun 11 '20 at 12:12
1

You could just do:

readfilestr.replace("[the text to remove goes here]", "")
Manu1800
  • 145
  • 7
  • 1
    thank you I am getting the following- Empty DataFrame Columns: [£, 2800.02020-06-08, 19:48:28.975953£, 500.02020-06-08, 19:48:47.833899£, 800.02020-06-08, 19:49:45.017243] Index: [] – Rimi Jun 10 '20 at 14:28
  • If you want to remove those symbols from every single item in the list, then you can do a for loop: for i in len(mylist): if mylist[i].__contains__("[unwanted]"): mylist[i] = str(mylist[i]).replace(['unwanted']) (also you can add a for loop if your is multi dimensional) – Manu1800 Jun 11 '20 at 16:19
0

You can take a look at the Regular Expressions (RegEx) module re.

import re

string = "test with £,:,-"

new_string= re.sub('[£:-]', "", string)

print(new_string) # test with ,,

There are some good examples here,

MBoaretto
  • 21
  • 5