0

I am using juypter to analyze a large csv file.

Inside the file there are around 40,000 str values and 15 float values. I am trying to convert all the str values to numeric so i can analyze all the data.

However, I cannot due to the float values randomly inside the data. Is there a simple way to simply remove all these values?

I am relativley new to coding so please bear with me if this seems like a "dumb" quesiton.

import pandas as pd

df = pd.read_csv('stripperdata.csv')

for i in df['Pressure']:
    if isinstance(i , str):
        int(i)
    if isinstance(i , float):
        df.remove(i)

when I do this i am getting a error "Invalid literal for int() with base 10:"

SeaBean
  • 22,547
  • 3
  • 13
  • 25
SeanK22
  • 163
  • 8
  • how does the csv look like? you might be able to use regex to get rid of the float values. – PApostol Oct 04 '21 at 19:02
  • You could just turn everything to float. Alternately, you could just check if the string has a dot in it, and if it does turn it to float, else int. – OneMadGypsy Oct 04 '21 at 19:03
  • Welcome to stack overflow! Please [edit] your question to include a [mcve] with sample input, expected output, and _code_ for what you've already tried based on your own research so that we can better understand how to help. Based on the pandas tag, look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for guidance as well – G. Anderson Oct 04 '21 at 19:06
  • The CSV file has 2 columns, 1 is date/time, the other is pressure. The rows in the pressure columns are the ones that are messed up – SeanK22 Oct 04 '21 at 19:22

2 Answers2

0

Edit: I made a mistake in my code the first time. I was removing the index during iteration causing it to skip over one of the elements. I admit this is a messy solution. I'm still learning myself.

values = ["11", "15", "74", "2.3", "11.7", "34"]

index = 0

for i in values:
    print(values[index])
    if "." in values[index]:
        print("Here's one: " + values[index])
        values.remove(values[index])
    elif isinstance(values[index], str):
        int(values[index])
        index += 1
    print(index)

print(values)
Meatforge
  • 13
  • 4
  • Thanks for helping , when I do this I am getting an error that says "invalid literal for int() with base 10: "150+A2202:B2303" That 150 value is one of the floats from my sheet I am trying to get rid of – SeanK22 Oct 04 '21 at 19:25
  • I made a change. I made the assumption that the floats were indeed floats and not also strings. This should work if every value is a string. – Meatforge Oct 04 '21 at 19:28
  • No problem, this also worked. Thanks for the help. I dont have enough reputation to "thumbs up" so im sorry. – SeanK22 Oct 04 '21 at 19:29
0

Assuming you have the following dataframe:

df = pd.DataFrame({'val': ['1', 2.0, '3', 4, '5', '6.6', '7', '8.8']})

   val
0    1
1  2.0          <=== float
2    3
3    4          <=== int
4    5
5  6.6
6    7
7  8.8

where 2.0 and 4 are float and int types. Others are strings of numbers.

You can drop the float and int values by, for example:

s_cleaned = df['val'].loc[~df['val'].map(lambda x: isinstance(x, float) | isinstance(x, int))]

Result:

print(s_cleaned)


0      1
2      3
4      5
5    6.6
6      7
7    8.8
Name: val, dtype: object

You can also "remove" these float and int values by changing them to NaN (null values), as follows:

df['val'] = df['val'].mask(df['val'].map(lambda x: isinstance(x, float) | isinstance(x, int)))

Result

print(df)

   val
0    1
1  NaN
2    3
3  NaN
4    5
5  6.6
6    7
7  8.8
SeaBean
  • 22,547
  • 3
  • 13
  • 25