-1

When I write my codes like this, I get ValueError: invalid literal for int() with base 10: ' '. Basically I guess it's the problem with the type conversion but I don't know how to edit it here. Can you help me please ? This is my codes:

#preprocessing
df['Memory'] = df['Memory'].astype(str).replace('.0', '', regex=True)
df["Memory"] = df["Memory"].str.replace('GB', '')
df["Memory"] = df["Memory"].str.replace('TB', '000')
new = df["Memory"].str.split("+", n = 1, expand = True)
df["first"]= new[0]
df["first"]=df["first"].str.strip()
df["second"]= new[1]
df["Layer1HDD"] = df["first"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer1SSD"] = df["first"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer1Hybrid"] = df["first"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer1Flash_Storage"] = df["first"].apply(lambda x: 1 if "Flash Storage" in x else 0)
df['first'] = df['first'].str.replace(r'D', '')
df["second"].fillna("0", inplace = True)
df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)
df['second'] = df['second'].str.replace(r'D', '')
#binary encoding
df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)
#only keep integert(digits)
df['second'] = df['second'].str.replace(r'D','')#convert to numeric




df['second'] = df['second'].astype(int)
df['first'] = df['first'].astype(int)
df['second'] = df['second'].astype(int)



#finalize the columns by keeping value
df["HDD"]=(df["first"]*df["Layer1HDD"]+df["second"]*df["Layer2HDD"])
df["SSD"]=(df["first"]*df["Layer1SSD"]+df["second"]*df["Layer2SSD"])
df["Hybrid"]=(df["first"]*df["Layer1Hybrid"]+df["second"]*df["Layer2Hybrid"])
df["Flash_Storage"]=(df["first"]*df["Layer1Flash_Storage"]+df["second"]*df["Layer2Flash_Storage"])
#Drop the un required columns
df.drop(columns=['first', 'second', 'Layer1HDD', 'Layer1SSD', 'Layer1Hybrid',
       'Layer1Flash_Storage', 'Layer2HDD', 'Layer2SSD', 'Layer2Hybrid',
       'Layer2Flash_Storage'],inplace=True)

I get the error in the title in this code and unfortunately my knowledge of python is limited. I don't know how to solve it. Can you help me ? My dataset is here

nesly
  • 17
  • 5
  • 2
    can you paste here the entire trackback? – SiP Apr 13 '23 at 15:15
  • ValueError Traceback (most recent call last) Cell In[60], line 31 25 #only keep integert(digits) 26 df['second'] = df['second'].str.replace(r'D','')#convert to numeric ---> 31 df['second'] = df['second'].astype(int) 32 df['first'] = df['first'].astype(int) 33 df['second'] = df['second'].astype(int) – nesly Apr 13 '23 at 15:37
  • 1
    why are you repeating `df['second'] = df['second'].astype(int)`? – SiP Apr 13 '23 at 15:43
  • From the traceback you gave it's not clear where the error actually is. Are you sure it's the complete traceback? – SiP Apr 13 '23 at 15:44
  • I actually follow the steps here exactly:https://www.analyticsvidhya.com/blog/2021/11/laptop-price-prediction-practical-understanding-of-machine-learning-project-lifecycle/ I don't know enough to do all this myself. – nesly Apr 13 '23 at 15:50
  • 1
    Please add tracebacks etc [edit]ed into your question (and formatted as code). Comments are unreadable for code and tracebacks. – 9769953 Apr 13 '23 at 15:55
  • For debugging help, you need to make a [mre]. For specifics if you're using Pandas or something similar, see [How to make good reproducible pandas examples](/q/20109391/4518341). And for more tips, like how to write a good title, see [ask]. – wjandrea Apr 13 '23 at 16:00

1 Answers1

0

You get this error ValueError: invalid literal for int() with base 10 because you are trying to convert a series to int (df['second'].astype(int)) that has non-numeric values.

In the line df['second'] = df['second'].str.replace(r'D','') your regex is wrong. To remove non-numeric characters you should use

df['second'] = df['second'].str.replace(r'\D+', '')

Also do this for the series df['first']

df['first'] = df['first'].str.replace(r'\D+', '')
Henrique Andrade
  • 855
  • 1
  • 12
  • 25