-4

I'm importing Data from a database into python data frame. Now, I wish to use the data for further analysis, however, I need to do a little cleaning of the data before using. Currently, the required column is formatted like ('2275.1', '1950.4'). The output that I require should look like:2275.1 and 1950.4 exclusively. can someone please help

user4943236
  • 5,914
  • 11
  • 27
  • 40

5 Answers5

0
import re
print re.findall(r"\b\d+(?:\.\d+)?\b",test_str)

You can simply do this.

or

print map(float,re.findall(r"\b\d+(?:\.\d+)?\b",x))

If you want float values.

vks
  • 67,027
  • 10
  • 91
  • 124
0

Try ast.literal_eval, which evaluates its argument as a constant Python expression:

import ast

data = ast.literal_eval("('2275.1', '1950.4')")
# data is now the Python tuple ('2275.1', '1950.4')

x, y = data
# x is '2275.1' and y is '1950.4'
nneonneo
  • 171,345
  • 36
  • 312
  • 383
0

I assume, that the string you provided is actually the output of python. It is hence a tuple, containing two strings, which are numbers. If so and you would like to replace the ', you have to convert them to a number format, such as float:

a = ('2275.1', '1950.4')
a = [float (aI) for aI in a] 
print a
[2275.1, 1950.4]
jhoepken
  • 1,842
  • 3
  • 17
  • 24
0

This is one way to do it:

import re
x = "'('2275.1', '1950.4')'"
y = re.findall(r'\d+\.\d', x)
for i in y:
  print i

Output:

2275.1
1950.4
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
0

Here a non-regex approach:

data = (('2275.1', '1950.4'))


result = data[0]# 0 means the value in the first row
result2 = data[1]# 1 means the next row after 0


print result
print result2

Output:

>>> 
2275.1
1950.4
>>> 
Roy Holzem
  • 860
  • 13
  • 25