-2

I want to obtain the top 3 cities and items based on their sales, but the only thing I can do now is return the all cities and items with their respective sales. Without using dict, can I obtain my desired output? Or if I use dict, how do I obtain the desired output?

purchases.txt

2012-01-01  09:00   San Jose    Men's Clothing  214.05  Amex
2012-01-01  09:00   Fort Worth  Women's Clothing    153.57  Visa
2012-01-01  09:00   San Diego   Music   66.08   Cash
2012-01-01  09:00   Pittsburgh  Pet Supplies    493.51  Discover
2012-01-01  09:00   Omaha   Children's Clothing 235.63  MasterCard
2012-01-01  09:00   Stockton    Men's Clothing  247.18  MasterCard
2012-01-01  09:00   Austin  Cameras 379.6   Visa
2012-01-01  09:00   New York    Consumer Electronics    296.8   Cash
2012-01-01  09:00   Corpus Christi  Toys    25.38   Discover
2012-01-01  09:00   Fort Worth  Toys    213.88  Visa

test.py

    f = open ("purchases.txt")

    def separator():
        str = ("="*48)
        print (str)
        return;

    city_seen = set()
    item_seen = set()

    citysaleslist = []
    itemsaleslist= []

    for line in open(sys.argv[1]):

        sales=float(line.split()[-2]) 

        strsales=line.split()[-2]

        city=line.split('\t')[2] 
        item=line.split('\t')[3]

        if city not in city_seen: # if city is not a duplicate, add to city_seen set

            city_seen.add(city)

        #Pressing tab for the bottom 2 lines will remove duplicate but combining the sales for the duplicates is impossible here.
        citysales="{0:<29}{1:>18}".format(city,strsales)
        citysaleslist.append(citysales)


        if item not in item_seen: # if item is not a duplicate, add to item_seen set

             item_seen.add(item)

        #Pressing tab for the bottom 2 lines will remove duplicate but combining the sales for the duplicates is impossible here.
        itemsales = "{0:<29}{1:>18}".format(item,strsales)
        itemsaleslist.append(itemsales)


     print("Top Three Cities \n")
     separator()

     for i in citysaleslist:

         print(i)


     separator()


     print("Bottom Three Cities \n")
     separator()


     separator()


     print("Top Three Item Categories")
     separator()

     for i in itemsaleslist:

     print(i)

     separator()


     print("\nBottom Three Item Categories")
     separator()


     separator()      

My output:

Top Three Cities 

================================================
San Jose                                 214.05
Fort Worth                               153.57
San Diego                                 66.08
Pittsburgh                               493.51
Omaha                                    235.63
Stockton                                 247.18
Austin                                    379.6
New York                                  296.8
Corpus Christi                            25.38
Fort Worth                               213.88
================================================
Bottom Three Cities 

================================================
================================================


Top Three Item Categories
================================================
Men's Clothing                           214.05
Women's Clothing                         153.57
Music                                     66.08
Pet Supplies                             493.51
Children's Clothing                      235.63
Men's Clothing                           247.18
Cameras                                   379.6
Consumer Electronics                      296.8
Toys                                      25.38
Toys                                     213.88
================================================

Bottom Three Item Categories
================================================
================================================

Desired output:

Top Three Cities 

================================================
Pittsburgh                               493.51
Austin                                   379.60
Fort Worth                               367.45
================================================
Bottom Three Cities 

================================================
Omaha                                     235.63
San Jose                                  214.05
San Diego                                  66.08
================================================


Top Three Item Categories
================================================
Pet Supplies                             493.51
Men's Clothing                           461.23
Cameras                                   379.6
================================================

Bottom Three Item Categories
================================================
Toys                                      239.26
Children's Clothing                       235.63
Women's Clothing                          153.57
================================================
john tan
  • 123
  • 2
  • 8
  • You may want to post this first on https://codereview.stackexchange.com/ first. In its current state it won't run and even if you fix that it's still very hard to read. – Bailey Parker Jan 22 '18 at 04:26
  • Although some general advice. I'd look into [`namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple). I'd make one for each line in the purchases file: `namedtuple('Purchase', ('datetime', 'location', 'type', 'amount', 'card_brand'))` and store in a list. Then something like finding the top cities is just `amount_by_city = defaultdict(int); for purchase in purchases: amount_by_city[purchase.location] += purchase.amount` – Bailey Parker Jan 22 '18 at 04:30
  • 1
    [Don't use float for money](https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency). Use [`decimal`](https://docs.python.org/3.0/library/decimal.html). – Bailey Parker Jan 22 '18 at 04:34
  • You seem to repeat yourself a lot (`thing = line.split('\t')[...]`). You should look into [tuple unpacking](https://www.developer.com/lang/other/article.php/630101/Learn-to-Program-using-Python-Unpacking-Tuples.htm). It will make your parsing much easier: `datetime, city, type, amount, card_brand = line.split('\t')` – Bailey Parker Jan 22 '18 at 04:36

2 Answers2

0

You need to sort your data. Once you sort your data, you can print out the data as you desire.

#brief example (but not a direct solution!)
aList = [ ["citya","22"]   \
                ["cityc","44"]   \
                ["cityb","55"]   \ 
              ]
aSortedList = sorted(alist, key=lamda x:x[1])
#now pick how you want to get your information from the sorted list
#hint you have already read the information. 

But you do NOT have your data linked directly. So either create a new list with all the information OR track the information as you go.

Part of the issue is that you are trying to do everything immediately as you read the file. The data read has NOT been sifted through. No sorting, only read. Finding minimums and maximums do NOT need to be sorted, but then you'll need to TRACK the information.

if (new_value > max1):
   max3 = max2
   max2 = max1
   max1 = new_value

if (new_value > max2 and new_value < max1):
   ...

and you'll have to loop through it all.

Tracking is good if the information is given all at once. But if the information changes, referencing the data might be easier later.

0

You can use a dictionary to get your desired output.
Two dictionary one for cities & the other for items. And then sort by the dictionary by values.

Example:

import operator
lines = """2012-01-01    09:00    San Jose    Men's Clothing    214.05    Amex
2012-01-01    09:00    Fort Worth    Women's Clothing    153.57    Visa
2012-01-01    09:00    San Diego    Music    66.08    Cash
2012-01-01    09:00    Pittsburgh    Pet Supplies    493.51    Discover
2012-01-01    09:00    Omaha    Children's Clothing    235.63    MasterCard
2012-01-01    09:00    Stockton    Men's Clothing    247.18    MasterCard
2012-01-01    09:00    Austin    Cameras    379.6    Visa
2012-01-01    09:00    New York    Consumer Electronics    296.8    Cash
2012-01-01    09:00    Corpus Christi    Toys    25.38    Discover
2012-01-01    09:00    Fort Worth    Toys    213.88    Visa""".split("\n")


cities = {}
itemsVal = {}
for i in lines:
    val = i.split("    ")
    if val[2] not in cities:
        cities[val[2]] = float(val[-2])
    else:
        cities[val[2]] += float(val[-2])
    if val[-3] not in itemsVal:
        itemsVal[val[-3]] = float(val[-2])
    else:
        itemsVal[val[-3]] += float(val[-2])


cities = sorted(cities.items(), key=operator.itemgetter(1))      #Sort by sales value
itemsVal = sorted(itemsVal.items(), key=operator.itemgetter(1))  #Sort by sales value



lineSep = "="*48
print("Top Three Cities")
print(lineSep)
for i in reversed(cities[-3:]):
    print("{0:<29}{1:>18}".format(i[0], i[1]))

print(lineSep)
print("\n")
print("Bottom Three Cities")
print(lineSep)
for i in cities[:3]:
    print("{0:<29}{1:>18}".format(i[0], i[1]))
print(lineSep)
print("\n")

print("Top Three Item Categories")
print(lineSep)
for i in reversed(itemsVal[-3:]):
    print("{0:<29}{1:>18}".format(i[0], i[1]))

print(lineSep)
print("\n")
print("Bottom Three Item Categories")
print(lineSep)
for i in itemsVal[:3]:
    print("{0:<29}{1:>18}".format(i[0], i[1]))
print(lineSep)
print("\n")

Result:

Top Three Cities
================================================
Pittsburgh                               493.51
Austin                                    379.6
Fort Worth                               367.45
================================================


Bottom Three Cities
================================================
Corpus Christi                            25.38
San Diego                                 66.08
San Jose                                 214.05
================================================


Top Three Item Categories
================================================
Pet Supplies                             493.51
Men's Clothing                           461.23
Cameras                                   379.6
================================================


Bottom Three Item Categories
================================================
Music                                     66.08
Women's Clothing                         153.57
Children's Clothing                      235.63
================================================
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • you are forgetting that the sales for duplicates have to be combined. So the sales for "Toys" would be '239.236' instead of '25.38', "Fort Worth" '367.45 ' instead of '153.57' and "Men's Clothing" would be included in the top 3 item categories with '461.23' – john tan Jan 22 '18 at 14:14
  • @johntan: In that case you can use a dictionary. I have updated the solution. – Rakesh Jan 22 '18 at 14:49