0

enter image description hereI'm struggling to remove some characters from the extracted data. I've managed to remove the '£' from the price and that's it.

Outcome: What I am getting

Tried:

data = json.loads(r.text)

products = data['upcoming']


product_list = []

for product in products:
    price = product['price']
    date = product['launchDate']

    productsforsale = {
        'Retail_price': price,
        'Launch_date': date,
    }
    product_list.append(productsforsale)

    df = pd.DataFrame(product_list).replace('£',"")
    df.to_csv('PATH.csv')
    print('saved to file')

Expected outcome:

110.00    2023-01-15 08:00
  • 1
    Could you please add a .json snippet to reproduce your result? – Hermann12 Feb 13 '23 at 20:07
  • Please make sure you provide a [mcve], which includes enough data (preferably hardcoded into the sourcecode) to demonstrate the issue. As a new user here, also take the [tour] and read [ask]. – Ulrich Eckhardt Feb 13 '23 at 20:25

2 Answers2

0

You can get the amount from the price dictionary by price['amount']. The time can be converted to your desired timeformat with the datetime module:

from datetime import datetime
datetime_date = datetime.strptime(date, "%Y-%m-%dT%H:%M:%S.%fZ")
new_date = datetime_date.strftime("%Y-%m-%d %H:%M")

I can´t test it with your original .json snipped though.

0

You can format the time as so: strftime

date = product['launchDate'].strftime("%Y-%m-%d %H:%M")

You're currently not correctly getting the price, you are extracting the whole [price] element, but you only want the amount within the price.

You can format the price as so:

price = product['price']['amount']

The full code


from datetime import datetime
data = json.loads(r.text)

products = data['upcoming']

df = pd.DataFrame()
for product in products:
    price = product['price']['amount']
    date = datetime.strptime(product['launchDate'], "%Y-%m-%dT%H:%M:%S.%fZ")
    date = date.strftime("%Y-%m-%d %H:%M")
    df = df.append({"Price": price, "Date": date}, ignore_index=True)

df.to_csv('PATH.csv')
print('saved to file')

This should save a csv with 2 columns, Price and Date, with all the unnecessary info removed

mrblue6
  • 587
  • 2
  • 19
  • Thanks, I tried but I am getting: price = product['price'].replace('£',"").replace("GBP") ^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'dict' object has no attribute 'replace' –  Feb 13 '23 at 20:34
  • I just updated my answer now that you've added the JSON, you were doing it incorrectly. A JSON read by Python turns into a nested dictionary. See here: https://pybit.es/articles/case-study-how-to-parse-nested-json/ or here: https://www.geeksforgeeks.org/json-loads-in-python/ – mrblue6 Feb 13 '23 at 20:40
  • Thanks appreciate it, will read up. I am completely new trying to self teach so struggling at times. Also the amended doesn't work for me but will read the material. Regards –  Feb 13 '23 at 20:45
  • No worries, I can try help if you let me know the error. – mrblue6 Feb 13 '23 at 20:47
  • date = product['launchDate'].strftime("%Y-%m-%d %H:%M") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'str' object has no attribute 'strftime' then if I remove this line,: raise TypeError("Can only append a dict if ignore_index=True") TypeError: Can only append a dict if ignore_index=True –  Feb 13 '23 at 20:53
  • I've fixed it now. The problem was that you first have to specify what the date already looks like, then you can reformat to how you want it to look. – mrblue6 Feb 13 '23 at 21:06
  • Brilliant, thank you. I really appreciate your time/help. :) –  Feb 13 '23 at 21:09
  • 1
    Exactly as intended! –  Feb 13 '23 at 21:14
  • Sorry - do you know how to remove dates that are earlier than the current date? –  Feb 14 '23 at 14:28
  • https://stackoverflow.com/questions/8142364/how-to-compare-two-dates This link will help you. – mrblue6 Feb 14 '23 at 15:33
  • @Luke You can do an if statement to check if the date should be added or not before you add to the df – mrblue6 Feb 14 '23 at 15:34
  • :) I can't get my head around times etc lol. –  Feb 16 '23 at 13:56
  • Happy to help more if you need @Luke – mrblue6 Feb 16 '23 at 18:30
  • Thanks. I don't believe there's anyway to direct message you on here. I sort of got the hang of the above.. however, when it comes to other sites that are dynamically loaded I really struggle to find how to extract the information I require. for example - https://www.sevenstore.com/launches/ If i disable java, I see no images. I go to network and see listings and it says it's json/application as content type, however, under listings/ it shows at html? I cannot get this data at all –  Feb 20 '23 at 16:10
  • @Luke Unfortunately you can't dm on here. Tbh I'm not very good with scraping. But from what I can see, the product info for the sneakers is contained in the
    tag for each sneaker, there is also a link to the product page of the sneaker. You should be able to grab the html of the page and parse it with python, grabbing info you need such as when the draw ends, the name of the sneaker, etc.
    – mrblue6 Feb 21 '23 at 17:13