I am trying to predict stock price movement using Random Forest. But currently facing an issue as my Volume data isn't in numerical format. Instead of "23.9M, 24K, 67M" I want the column to denote "23900000, 24000, 67000000" Like wise. Can anyone please help me with a code to convert the column into numeric? Dataset
Asked
Active
Viewed 73 times
-3
-
Please go through the [intro tour](https://stackoverflow.com/tour), the [help center](https://stackoverflow.com/help) and [how to ask a good question](https://stackoverflow.com/help/how-to-ask) to see how this site works and to help you improve your current and future questions, which can help you get better answers. "Show me how to solve this coding problem?" is off-topic for Stack Overflow. You have to make an honest attempt at the solution, and then ask a *specific* question about your implementation. Stack Overflow is not intended to replace existing tutorials and documentation. – Prune Apr 26 '21 at 05:53
-
First and foremost, Stack Overflow is not a coding service. Second, the problem you cite is already solved in many places. See [How much research](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users) and the [Question Checklist](https://meta.stackoverflow.com/questions/260648/stack-overflow-question-checklist). – Prune Apr 26 '21 at 05:54
-
1Does this answer your question? [how to make a variable change from the text "1m" into "1000000" in python](https://stackoverflow.com/questions/2449848/how-to-make-a-variable-change-from-the-text-1m-into-1000000-in-python) – Aditya Apr 26 '21 at 05:56
-
1`df["Vol"].replace({"K": "e3", "M": "e6"}, regex=True).astype(float)` – Corralien Apr 26 '21 at 06:09
2 Answers
0
You can make use of Lambda functions:
mul = {'M':1e6, 'K': 1e3}
conv = lambda num: float(num[:-1]) * mul[num[-1].upper()]
conv('2.3M')
To use it on a column in the dataframe:
df['Volume'] = df['Volume'].apply(conv)
Or as suggested by @Corralien in the comments:
df["Vol"].replace({"K": "e3", "M": "e6"}, regex=True).astype(float)

Aditya
- 1,357
- 1
- 9
- 19
-1
This would be a rather tedious method but you could do something like this:
for i in volume_data:
if i [-1] == 'M':
val = float(i [:-1])
val = val * 1000000
elif i [-1] == 'K':
val = float(i [:-1])
val = val * 1000
You can replace the current column value with the new value you find using the code above.
It would really be helpful if you could provide more information about your data.

Rahil Kadakia
- 135
- 1
- 6