0

I'm stuck here because I have created a list for the two data frames. I have two tables with two columns each. The first table have columns product_name and brand, and the second table have columns product_name and shipping. Im trying to do a one to one join so I can have three columns on one table. it give me an error: KeyError: 'shipping'

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import pandas as pd
from collections import defaultdict
import re

url='https://www.newegg.com/PS4-Video-Games/SubCategory/ID-3141'

with uReq(url) as uClient:
    page = uClient.read()

# parsing
page_soup = soup(page, "html.parser")

# grabs products
containers= page_soup.findAll("div",{"class":"item-container"})

# file
filename = "products.csv"

d = defaultdict(list)
d1 = defaultdict(list)

# fill dict
for container in containers:
    brand = container.div.div.a.img["title"]

    title = container.findAll("a", {"class":"item-title"})
    product_name = title[0].text

    shipping_container = container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()
    
    d['brand'].append(brand)
    d['product'].append(product_name)
    d1['product'].append(product_name)
    d1['shipping'].append(shipping)
    
# create dataframe    
df = pd.DataFrame(d)
df1 =pd.DataFrame(d1)



# clean shipping column
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Free Shipping' else x)
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Special Shipping' else x) # probably should be handled in a special way
df['shipping'] = df['shipping'].apply(lambda x: x if x == 0 else re.sub("[^0-9]", "", x))
df['shipping'] = df['shipping'].astype(float)

# save dataframe to csv file
df.to_csv('dataframe.csv', index=False)
df1.to_csv('dataframe1.csv', index=False)
# choose rows where shipping is less than 5.99
#print(df[df['shipping'] > 200])
    
#merge two data sets 
df3 = pd.merge(df,df1)
print(df3)
Juan Rivas
  • 23
  • 5

1 Answers1

0

Use this:

df3 = df.merge(df1, on="product", how="left")

depends on what you prefer, you may use how='inner' or how='outer' instead.

adrtam
  • 6,991
  • 2
  • 12
  • 27
  • not really. when I running the compiler gives me three columns but only product and brand shows up but not shipping. instead of the brand data it only gives me there dots for every row. – Juan Rivas Oct 14 '20 at 23:16
  • ` Product ... Brand_y 0 The Last of Us Part II - PlayStation 4 ... PlayStation 1 The Last of Us Part II - PlayStation 4 ... PlayStation`. I get these but is missing the shipping row – Juan Rivas Oct 14 '20 at 23:59