0

How can I create a plot using information from a database table(mysql)? So for the x axis I would like to use the id column and for the y axis I would like to use items in cart(number). You can use any library as you want if it gives the result that I would like to have. Now in my plot(I attached the photo) on the x label it gives an interval of 500 (0,500,1000 etc) but I would like to have the ids(1,2,3,4,...3024) and for the y label I would like to see the items in cart. I attached the code. I will appreciate any help.

enter image description here

import pymysql
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

conn = pymysql.connect(host='localhost', user='root', passwd='', db='amazon_cart')

cur = conn.cursor()
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")

plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')

rows = cur.fetchall()
df = pd.DataFrame([[xy for xy in x] for x in rows])

x=df[0]
y=df[1]

plt.bar(x,y)

plt.show()

cur.close()
conn.close()

SQL OF THE TABLE

DROP TABLE IF EXISTS `csv_9_05`;
CREATE TABLE IF NOT EXISTS `csv_9_05` (
  `id` int(50) NOT NULL AUTO_INCREMENT,
  `product title` varchar(2040) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `product price` varchar(55) NOT NULL,
  `items in cart` varchar(2020) DEFAULT NULL,
  `items in cart(number)` varchar(50) DEFAULT NULL,
  `link` varchar(2024) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3025 DEFAULT CHARSET=latin1;
ryy77
  • 1,134
  • 5
  • 20
  • 36
  • Could you tell us a little more about your database schema? How is the table `csv_9_05` laid out? – nat5142 Jun 14 '18 at 14:03
  • Yes. I will update the question with the sql of the table. Thanks! – ryy77 Jun 14 '18 at 14:05
  • I updated now! There are 3024 items in the table. The plot is showing with interval(0,500,1000 etc as you see) but I would like to get the ids(1,2,3,4...3024) and also for the y axis to see the items in cart – ryy77 Jun 14 '18 at 14:15

1 Answers1

1

Hm... I think restructuring your database is going to make a lot of things much easier for you. Given the schema you've provided here, I would recommend increasing the number of tables you have and doing some joins. Also, your data type for integer values (the number of items in a cart) should be int, not varchar. Your table fields shouldn't have spaces in their names, and I'm not sure why a product's id and the number of products in a cart are given a 1-to-1 relationship.

But that's a separate issue. Just rebuilding this database is probably going to be more work than the specific task you're asking about. You really should reformat your DB, and if you have questions about how, please tell me. But for now I'll try to answer your question based on your current configuration.

I'm not terribly well versed in Pandas, so I'll answer this without the use of that module.

If you declare your cursor like so:

cursor = conn.cursor(pymysql.cursors.DictCursor)
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")

Then your rows will be returned as a list of 3024 dictionaries, i.e.:

rows = cursor.fetchall()

# this will produce the following list:
# rows = [
#        {'id': 1, 'items in cart(number)': 12, 'product_title': 'hammer'},
#        {'id': 2, 'items in cart(number)': 5, 'product_title': 'nails'},
#        {...},
#        {'id': 3024, 'items in cart(number)': 31, 'product_title': 'watermelons'}
#    ]

Then, plotting becomes really easy.

plt.figure(1)
plt.bar([x['id'] for x in rows], [y['items in cart(number)'] for y in rows])
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
plt.show()
plt.close()

I think that should do it.

nat5142
  • 485
  • 9
  • 21
  • I changed the varchar to int for items in cart. The problem with the y label is gone but it shows numbers like this:0, 2000, 4000, ....14000 and I would like to show numbers up to 999(this is the maximum value for the items in cart) and how can I get the ids for the x axis(now it shows (0,500,1000,1500,...3000). I would like to show(1,2,3,4,...3024). I think is about the bins. – ryy77 Jun 14 '18 at 15:02
  • Showing all numbers from 1 to 3024 is going to cause some major overlapping. May want to reconsider that (have a tick for every 100, for example). If you want to define ticks yourself, use [plt.xticks and plt.yticks](https://stackoverflow.com/questions/12608788/changing-the-tick-frequency-on-x-or-y-axis-in-matplotlib) – nat5142 Jun 14 '18 at 15:21