How can I sort this data as if it were in a dictionary?

Question

I have this text file in which there are certain products, each with the stores in which they are available. Store lines start with tab characters, product lines do not.

To be able to visualize it in a better way, I want to order it as a dictionary, having as a key the name of the store followed by a list of the products. An example is:

{
    'Store1' : ['product1', 'product2'],
    'Store2' : ...
}

This is an example of the data that I have, stores for each product:

Crucial Ballistix BLT8G4D26BFT4K
- Infor-Ingen
- Bip
- PC Factory
- MyBox
Patriot Signature Line PSD48G266681
- PC Express
- Soluservi
Kingston KCP426NS6/8
- YouTech
- Bip

The expected output would have to be something like this (pretty printed):

{
    'Infor-Ingen' : ['Crucial Ballistix BLT8G4D26BFT4K'     ],
    'Bip'         : ['Crucial Ballistix BLT8G4D26BFT4K',
                     'Kingston KCP426NS6/8'                 ],
    'PC Factory'  : ['Crucial Ballistix BLT8G4D26BFT4K'     ],
    'MyBox'       : ['Crucial Ballistix BLT8G4D26BFT4K'     ],
    'PC Express'  : ['Patriot Signature Line PSD48G266681'  ],
    'Soluservi'   : ['Patriot Signature Line PSD48G266681'  ],
    'YouTech'     : ['Kingston KCP426NS6/8'                 ]
}

And I have this code

from collections import OrderedDict

od = OrderedDict()
tienda, producto ,otra,aux,wea= [], [],[], [],[]

with open("rams.txt","r") as f:
    data = f.readlines()
    for linea in data:
        linea = linea.strip('\n')
        if '\t' in linea:
            tienda.append(linea.strip('\t'))
            aux.append(linea.strip("\t").strip("\n"))
        else:
            otra.append(aux)
            aux=[]
            producto.append(linea)
            aux.append(linea.strip("\n"))
    tienda = sorted(list(set(tienda)))
    for i in range(1,len(otra)):
        wea=[]
        for key in tienda:
            if key in otra[i]:
                wea.append(otra[i][0])
                od[key] = wea

Now the problem is that, at the time of printing the dictionary, it gives me something like this:

('Bip', ['Crucial Ballistix BLT8G4D26BFT4K ']), ('Infor-Ingen', ['Crucial Ballistix BLT2K8G4D26BFT4K ']), ('MyBox', ['Crucial Ballistix CT16G4DFD8266']),..)

your problem's on how the parenthesis is printed?. That print output it's defined on the default `__str__` and `__repr__` methods in `OrderedDict` class definition. there are a few alternatives to change those methods, but I would recommend building your own. Take a look at this [question](https://stackoverflow.com/questions/4301069/any-way-to-properly-pretty-print-ordered-dictionaries) and particularly this [answer](https://stackoverflow.com/a/4303996/5318634) — Pablo, Jun 19 '21 at 02:05

score 0 · Accepted Answer · answered Jun 19 '21 at 02:21

You are having some problems parsing your file. You should sit down for a moment and try to understand what is what you trying to accomplish given the format of your data.

The file consist of lines which can be considered as a set of:

A non-indented line containing a product name
Followed by indented lines containing the store that have the product

So, when you read a product, you should remember that product until a new product is read.

For each store that you read, you should add the product to the list a of products that the store has available. For this you need a dictionary on which the keys are the store name and the values are the product.

Have in mind that you must check if the store exists in the dictionary before trying to append the product.

One way to solve it would be this:

products_by_store = dict()
with open("rams.txt","r") as f:
    cur_prod = None
    data = f.readlines()
    for linea in data:
        linea = linea.strip('\n')
        if '\t' in linea:
          linea = linea.strip('\t')
          if cur_prod:
            if not linea in products_by_store:
              products_by_store[linea] = [cur_prod]
            else:
              products_by_store[linea].append(cur_prod)
        else:
          cur_prod = linea
for k,v in products_by_store.items():
  print(k, v)

Which will return the following output:

Infor-Ingen ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K']
Bip ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K', 'Kingston KCP426NS6/8']
PC Factory ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K']
MyBox ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K']
PC Express ['Patriot Signature Line PSD48G266681']
Soluservi ['Patriot Signature Line PSD48G266681']
YouTech ['Kingston KCP426NS6/8']

Of course you should adapt it to your needs. You say something about using an ordered set. It should be trivial to sort the elements once you have everything in place.

paxdiablo · Answer 2 · 2021-06-19T06:04:46.167

First things first - with regard to just using print() on a class, keep in mind the generally accepted purpose of __str__() (which is what print() calls to weave its magic). It's meant to be a human-readable representation of the object.

Hence the default __str__() for OrderedDict is doing exactly what is intended. It is not necessarily what you would want to see for your specific case but the solution to that is to realise that this would be better done as an abstraction of OrderedDict.

Part of Python's power (as an object-oriented language) is its ability to define new classes based on current ones, that add whatever extra behaviour or state you desire.

For your case, I would be implementing an OrderedDict sub-class and changing the output of __str__() to generate whatever format you need, something like:

from collections import OrderedDict

class ProductDb(OrderedDict):
    # Optional file to constructor to load immediately.

    def __init__(self, fspec=None):
        super().__init__(self)
        if fspec is not None:
            self.load(fspec)

    # Allow reloading at any point.

    def load(self, fspec):
        # Remove all existing information.

        self.clear()

        # For the Espanol-challenged amongst us:
        #     archivo = file
        #        este = this
        #       linea = line
        #    producto = product
        #      tienda = shop

        with open(fspec, 'r') as archivo:
            # To handle missing product line at start of file,
            # start with a fixed value. If first line IS a
            # product, it will simply replace this fixed value.
            # Then we process each line, sans newline character.

            este_producto = 'UNKNOWN'

            for linea in archivo.readlines():
                linea = linea.strip('\n')

                # Tienda lines start with tabs, producto lines do not.

                if '\t' in linea:
                    tienda = linea.strip('\t')

                    # Make NEW shops start with empty product list.
                    # Then we can just add current product to the
                    # list, not caring if shop was new.

                    if tienda not in self:
                        self[tienda] = []
                    self[tienda].append(este_producto)
                else:
                    # Change current product so subsequent
                    # stores pick that up instead.

                    este_producto = linea

        # Then, for each dictionary entry (store), de-dupe
        # and sort list (products), giving sorted products
        # within sorted stores. Use a copy of the keys, this
        # ensures no changes to the dictionary while you're
        # iterating over it.

        for key in list(self.keys()):
            self[key] = sorted(list(set(self[key])))

    def __str__(self):
        def spc(n): return " " * n

        # Get maximum store/product lengths for formatting.

        max_st = max([len(st) for st in self])
        max_pr = max([len(pr) for st in self for pr in self[st]])

        out = ""
        st_sep = f"{{\n{spc(4)}"
        for st in self:
            out += f"{st_sep}'{st}'{spc(max_st-len(st))} : "
            pr_sep = f"["
            for pr in self[st]:
                out += f"{pr_sep}'{pr}'"
                pr_sep = f",{spc(max_pr-len(pr))}\n{spc(max_st+10)}"
            out += f"{spc(max_pr-len(self[st][-1])+1)}]"
            st_sep = f",\n{spc(4)}"
        out += f"\n}}"

        return out

xyzzy = ProductDb('infile.txt')
print(xyzzy)

You'll notice I've also made some fairly hefty changes to the file loader method, other than just making it a method of the class.

Your original file loading code doesn't need to be anywhere near as complex as it currently is. Specifically, you can get rid of all those temporary lists just by constructing a dictionary of lists on the fly (the in-code comments should hopefully explain things).

I've used the following infile.txt test file (with a single tab at the start of the shop lines):

Crucial Ballistix BLT8G4D26BFT4K
    Infor-Ingen
    Bip
    PC Factory
    MyBox
Patriot Signature Line PSD48G266681
    PC Express
    Soluservi
Kingston KCP426NS6/8
    YouTech
    Bip

The output is as follows, close enough to what you asked for:

{
    'Infor-Ingen' : ['Crucial Ballistix BLT8G4D26BFT4K'    ],
    'Bip'         : ['Crucial Ballistix BLT8G4D26BFT4K',
                     'Kingston KCP426NS6/8'                ],
    'PC Factory'  : ['Crucial Ballistix BLT8G4D26BFT4K'    ],
    'MyBox'       : ['Crucial Ballistix BLT8G4D26BFT4K'    ],
    'PC Express'  : ['Patriot Signature Line PSD48G266681' ],
    'Soluservi'   : ['Patriot Signature Line PSD48G266681' ],
    'YouTech'     : ['Kingston KCP426NS6/8'                ]
}

How can I sort this data as if it were in a dictionary?

2 Answers2