I have a dataframe extracted with Pandas for which one of the colums looks something like this:
What I want to do is to extract the numerical values (floats) in this column, which by itself I could do. The issue comes because I have some cells, like the cell 20 in the image, in which I have more than one number, so I would like to make an average of these values. I think that for that I would first need to recognize the different groups of numerical values in the string (each float number) and then extract them as floats to then operate with them. I don't know how to do this.
Edit: I have found an solution to this using the re.findall command from regex. This is based on the answer of a question in this thread Find all floats or ints in a given string.
for index,value in z.iteritems():
z[index]=statistics.mean([float(h) for h in re.findall(r'(?:\b\d{1,2}\b(?:\.\d*))',value)])
Note that I haven't included match for integers, and only account for values up to 99, just due to the type of data that I have.
However, I get a warning with this approach, due to the loop (there is no warning when I do it only for one element of the series):
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
Although I don't see any issue happening with my data, is this warning important?