1

I have read in a dataframe with an encoding of 'latin1' ... I applied this to the dataframe column in question:

output = []
for item in enumerate(capacity):
    filter(str.isdigit, item)
    output.append(item)

The dtype of my pandas object is 'dtype('O')'

This is what the pandas dataframe looks like:

    0   1
0   0   3850
1   1   3800
2   2   3700
3   3   3400
4   4   2600
... ... ...
6473    6473    1000
6474    6474    1000000
6475    6475    40000
6476    6476    40000
6477    6477    NaN

And when I use 'output[1].unique()', I get values like: '10000 sulf', '1222(gold','79000 Pyr:'

My question is, how can I remove the characters from the number string in the dataframe and convert the number string to int.

Im using Python v3.8.5

  • 2
    How did you read it into a dataframe? It may be better to parse the original text file. Can you post a few of the lines? – tdelaney Apr 20 '21 at 02:19
  • I read the file in as: df = pd.read_csv(path,encoding='latin1') – Pete_Turnham Apr 20 '21 at 02:20
  • This question should not have been closed without finding out whether the numbers really are properly broken out into columns. I suspect this is a harder question than a quick regex. – tdelaney Apr 20 '21 at 02:21
  • Can you show some original text and print out a sample dataframe so we can see how the data you show fit into its rows and columns? – tdelaney Apr 20 '21 at 02:22
  • I edited the question @tdelaney. Thank you so much for joining in on this one! – Pete_Turnham Apr 20 '21 at 02:37
  • You are seeing some odd values in your data (e.g., ` '1222(gold','79000 Pyr:'`). Since you read as csv, you should be able to find the lines that have this stuff in it. You could pull out a few samples of those lines and create a test program just working on them. Post that, then we can work with it. Does this CSV have just two columns? How are we to understand this data that has some gold and some pyr? What number should be extracted from that? In summary - you need to show the questionable data so we know what to do. – tdelaney Apr 20 '21 at 02:53

1 Answers1

0

filter(str.isdigit, item) is not inplace. You might want do

for item in enumerate(capacity):
    after_filter = filter(str.isdigit, item)
    output.append(after_filter)
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52