0

I web scraped a data off of a website (did it with pandas and soup) and now ready to clean it.

The dataset name is datasetk

First problem: There are numbers that are 11.0k for example. I want to remove the k and then add two zeros and the remove the decimal to have 11000 - 11 thousand

Second problem: There are numbers that are 5.0m for example. I want to remove the m and then add five zeros and the remove the decimal to have 5000000 - 5 million

I want to do this in a loop so I don't have to manually do it in python or R

  • Can you clarify your question? Please see [ask], [help/on-topic]. – AMC Mar 31 '20 at 23:46
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 31 '20 at 23:56

1 Answers1

0

The package, stringr, provides functions that make regex easier. You can add or remove to the text as you may need. Code Below:

library(stringr)

people <- c("10,000", "200", "5K", "2000000", "2M")  # before using regex
print(people)

people <- str_replace(people, "K", "000")

people <- str_replace(people, "M", "000,000")

print(people)    # After manipulation with regex

Output Below

[1] "10,000"  "200"     "5K"      "2000000" "2M"     
[1] "10,000"   "200"      "5000"     "2000000"  "2000,000"
Gray
  • 1,164
  • 1
  • 9
  • 23