1

While executing following code I am getting error as:

Warning message:
In Ops.factor(df$`Net Worth`[1], df$`Net Worth`[2]) :
  ‘+’ not meaningful for factors
library(XML)
library(htmltab)
library(dplyr)
library(RCurl)
library(bitops)

u = c("The Richest People in Tech List.html")
tables = readHTMLTable(u)
tables$the_list
data <- tables$the_list
df<-as.data.frame(data)
df$`Net Worth`[1]+df$`Net Worth`[2]

My data is given below

    Rank            Name Net Worth Age Origin of Wealth       Country
1     #1      Bill Gates   $84.5 B  63        Microsoft United States
2     #2      Jeff Bezos   $81.7 B  55       Amazon.com United States
3     #3 Mark Zuckerberg   $69.6 B  35         Facebook United States
4     #4   Larry Ellison   $59.3 B  75         software United States
5     #5      Larry Page   $43.9 B  46           Google United States
6     #6     Sergey Brin   $42.7 B  46           Google United States
7     #7         Jack Ma   $37.4 B  55       e-commerce         China
8     #8      Ma Huateng   $36.7 B  48   internet media         China
9     #9   Steve Ballmer   $32.9 B  63        Microsoft United States
10   #10    Michael Dell   $22.4 B  54   Dell computers United States
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Run str(df) to find out what type of data you're working with. The error suggests that you need to clean the data first before you can start doing calculations. The Net Worth column will need to be converted to an integer or numeric type in order to use the "+" function. – Cam McMains Oct 07 '19 at 06:48
  • Read about `make.names` to make R friendly names, to avoid messing around with backticks. – zx8754 Oct 07 '19 at 07:07
  • Related post, convert `$ Billions` to numeric before doing any arithmetics on them: https://stackoverflow.com/q/45972571/680068 – zx8754 Oct 07 '19 at 07:11
  • @CamMcMains I tried using as.numeric but it is not giving desired output. can u pls help me with exact code? – Kartik Shah Oct 08 '19 at 07:36
  • @zx8754 gsub is not giving desired output as I have $ and B/M both characters in data. can u pls help me with exact code? – Kartik Shah Oct 08 '19 at 07:37

2 Answers2

0

We need to remove $, and replace B as in linked post, then convert to numeric, so that we can do arithmetics(+):

# remove "$" and replace "B" with billions "e9"
df1$NetWorthClean <- as.numeric(gsub(" B", "e9", 
                                     gsub("$", "", df1$NetWorth, fixed = TRUE), 
                                     fixed = TRUE))

df1$NetWorthClean[ 1 ] + df1$NetWorthClean[ 2 ]
# [1] 1.662e+11

example data

df1 <- read.table(text = "Rank,Name,NetWorth,Age,OriginofWealth,Country
1,BillGates,$84.5 B,63,Microsoft,UnitedStates
2,JeffBezos,$81.7 B,55,Amazon.com,UnitedStates
3,MarkZuckerberg,$69.6 B,35,Facebook,UnitedStates
4,LarryEllison,$59.3 B,75,software,UnitedStates
5,LarryPage,$43.9 B,46,Google,UnitedStates
6,SergeyBrin,$42.7 B,46,Google,UnitedStates
7,JackMa,$37.4 B,55,e-commerce,China
8,MaHuateng,$36.7 B,48,internetmedia,China
9,SteveBallmer,$32.9 B,63,Microsoft,UnitedStates
10,MichaelDell,$22.4 B,54,Dellcomputers,UnitedStates", sep = ",",header = TRUE)
zx8754
  • 52,746
  • 12
  • 114
  • 209
0

Here is code that should work for you. It cleans up the column names of your dataframe so that you can reference them directly. (like most programs, R doesn't allow spaces, special characters, or names that begin with numbers) Then, it extracts the numeric components of the variable, converts their variable type to numeric, and multiplies them by 1 billion to give you an accurate representation.

library(dplyr)
library(stringr)
library(janitor)

df.clean <- df %>%
  clean_names() %>%
  mutate(numeric_worth = str_extract_all(net_worth, "\\(?[0-9,.]+\\)?")) %>%
  mutate(numeric_worth = as.numeric(numeric_worth) * 1e+09)