I have a huge dataframe composed of 7 columns. Extract:
45589 664865.0 100000.0 7.62275 -.494 1.60149 100010
...
57205 718888.0 100000.0 8.218463 -1.405-3 1.75137 100010
...
55143 711827.0 100000.0 8.156107 9.8336-3 1.758051 100010
As these values come from an input file, there are currently all of string type and I would like to change all the dataframe to float through :
df= df.astype('float')
However, as you might have noticed on the extract, there are ' - ' hiding. Some represent the negative value of the whole number, such as -.494 and others represent a negative power, such as 9.8-3.
I need to replace the latters with 'E-' so Python understands it's a power and can switch the cell to a float type. Usually, I would use:
df= df.replace('E\-', '-', regex=True)
However, this would also add an E to my negative values. To avoid that, I tried the solution offered here: Replace all a in the middle of string by * using regex
str = 'JAYANTA POKED AGASTYA WITH BAAAAMBOO '
str = re.sub(r'\BA+\B', r'*', str)
However, this is for one specific string. As my dataframe is quite large, I would like to avoid having to go through each cell.
Is there a combination of the functions replace
and re.sub
I could use in order to only replace the'-' surrounded by other strings by 'E-'?
Thank you for your help!