0

So after going through multiple questions regarding the alignment using format specifiers I still can't figure out why the numerical data gets printed to stdout in a wavy fashion.

def create_data(soup_object,max_entry=None):
    max_=max_entry
    entry=dict()
    for a in range(1,int(max_)+1):

        entry[a]={'Key':a,
        'Title':soup_object[a].div.text.strip(),
        'Link':soup_object[a].div.a['href'],
        'Seeds':soup_object[a](attrs={'align':'right'})[0].text.strip(),
        'Leechers':soup_object[a](attrs={'align':'right'})[1].text.strip()}

        yield entry[a]

tpb_get_data=tuple(create_data(soup_object=tpb_soup.body.table.find_all("tr"),max_entry=5))
for data in tpb_get_data:
    print('{0} {1:<11}  {2:<25} {3:<25} '.format(data['Key'], data['Title'], data['Seeds'],data['Leechers']))

I tried using f-strings with the formatting specifiers but still it prints the data in the following way, can someone please help me figure this out.

 1 Salvation.S02E11.HDTV.x264-KILLERS  262         19 
 2 Salvation.S02E13.WEB.x264-TBS[ettv]  229         25 
 3 Salvation.S02E08.HDTV.x264-KILLERS  178         21 
 4 Salvation.S02E01.HDTV.x264-KILLERS  144          11 
 5 Salvation.S02E09.HDTV.x264-SVA[ettv]  129       14

I have read most of the questions regarding this, I would like to know if there is a raw method rather than using a library like tabulate which does an excellent job. But I also want to learn how to do this without any library.

Georgy
  • 12,464
  • 7
  • 65
  • 73
Leon N
  • 184
  • 2
  • 24
  • 1
    You chose strange numbers for alignment. Why is it `1:<11` when the length of those strings are at least 34? Try something like `'{0} {1:<40} {2:<3} {3:<2}'`. – Georgy Oct 26 '18 at 09:35
  • Also, these are not f-strings! With [f-strings](https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals) you would have `print(f'{data['Key']} {data['Title']:<40} {data['Seeds']:<3} {data['Leechers']:<2}')` – Georgy Oct 26 '18 at 09:37
  • @Georgy I'm quite new to the formatting hence I didn't know what those numbers do. I know those aren't f-strings, I used them too but didn't post them here. Thanks for your input. – Leon N Oct 26 '18 at 10:57

3 Answers3

4

You get a misaligned result because you did not count the length of the titles correct. You only reserved 11 characters, where the first is already 34 characters long.

Easiest is to have your program count for you:

key_len,title_len,seed_len,leech_len = ( max(len(item[itemname]) for item in tpb_get_data) for itemname in ['Key','Title','Seeds','Leechers'] )

fmtstring = '{{:{:d}}} {{:{:d}}} {{:{:d}}} {{:{:d}}}'.format(key_len,title_len,seed_len,leech_len)

for data in tpb_get_data:
    print(fmtstring.format(data['Key'], data['Title'], data['Seeds'],data['Leechers']))

with the much better result

1 Salvation.S02E11.HDTV.x264-KILLERS   262 19
2 Salvation.S02E13.WEB.x264-TBS[ettv]  229 25
3 Salvation.S02E08.HDTV.x264-KILLERS   178 21
4 Salvation.S02E01.HDTV.x264-KILLERS   144 11
5 Salvation.S02E09.HDTV.x264-SVA[ettv] 129 14

(Additional only)

Here is a more generalized approach that uses a list of to-print key names and is able to generate all other required variables on the fly. It does not need hardcoding the names of the variables nor fixating their order – the order is taken from that list. Adjustments of the items to show all go in one place: that same list, get_items. The output separator can be changed in the fmtstring line, for example using a tab or more spaces between the items.

get_items = ['Key','Title','Leechers','Seeds']
lengths = ( max(len(item[itemname]) for item in tpb_get_data) for itemname in get_items )
fmtstring = ' '.join(['{{:{:d}}}' for i in range(len(get_items))]).format(*lengths)

for data in tpb_get_data:
    print(fmtstring.format(*[data[key] for key in get_items]))

It works as follows:

  1. The lengths list is filled with the maximum length of each named key taken from the get_items list.
  2. This returns a list; the fmtstring repeats the format instruction {:d} for each of these items and fills in the number. The outer {{: and }} gets translated by format into {: and } so the end result will be {:number} for each length. These separate format strings are joined into a single longer format string.
  3. Finally, the loop over the actual data prints the items from get_items. The list comprehension looks them up; the * notation forces the list to be 'written out' as separate values, instead of returning the entire list as one.

Thanks to @Georgy for suggesting to look for a less hardcoded variety.

Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Thanks I now see my foolishness. I never counted the length. @Georgy also gave me the right answer. Thanks a lot for your input . – Leon N Oct 26 '18 at 11:01
  • 1
    This violates DRY – Georgy Oct 26 '18 at 11:03
  • 1
    @usr2564301 [The Zen of Python](https://www.python.org/dev/peps/pep-0020/) says "*Flat is better than nested*" but now you have nested loops. There are still repetitions of `{{:{:d}}}` that could be avoided. And also, if user will decide to add a new key for printing, he will have to edit this code in 5 places! Check out my attempt on this: [link](https://gist.github.com/LostFan123/554cac3e3586a01db512f895eeaa4001). I'll post it here if the question gets reopened. In any case, I take the downvote back as OP asked for the reason behind that misalignment, not the most pythonic solution :) – Georgy Oct 27 '18 at 22:44
2

As already mentioned, you calculated lengths of strings incorrectly.
Instead of hardcoding them, delegate this task to your program.

Here is a general approach:

from operator import itemgetter
from typing import (Any,
                    Dict,
                    Iterable,
                    Iterator,
                    List,
                    Sequence)


def max_length(objects: Iterable[Any]) -> int:
    """Returns maximum string length of a sequence of objects"""
    strings = map(str, objects)
    return max(map(len, strings))


def values_max_length(dicts: Sequence[Dict[str, Any]],
                      *,
                      key: str) -> int:
    """Returns maximum string length of dicts values for specific key"""
    return max_length(map(itemgetter(key), dicts))


def to_aligned_data(dicts: Sequence[Dict[str, Any]],
                    *,
                    keys: List[str],
                    sep: str = ' ') -> Iterator[str]:
    """Prints a sequence of dicts in a form of a left aligned table"""
    lengths = (values_max_length(dicts, key=key) 
               for key in keys)

    format_string = sep.join(map('{{:{}}}'.format, lengths))

    for row in map(itemgetter(*keys), dicts):
        yield format_string.format(*row)

Examples:

data = [{'Key': '1',
         'Title': 'Salvation.S02E11.HDTV.x264-KILLERS',
         'Seeds': '262',
         'Leechers': '19'},
        {'Key': '2',
         'Title': 'Salvation.S02E13.WEB.x264-TBS[ettv]',
         'Seeds': '229',
         'Leechers': '25'},
        {'Key': '3',
         'Title': 'Salvation.S02E08.HDTV.x264-KILLERS',
         'Seeds': '178',
         'Leechers': '21'},
        {'Key': '4',
         'Title': 'Salvation.S02E01.HDTV.x264-KILLERS',
         'Seeds': '144',
         'Leechers': '11'},
        {'Key': '5',
         'Title': 'Salvation.S02E09.HDTV.x264-SVA[ettv]',
         'Seeds': '129',
         'Leechers': '14'}]
keys = ['Key', 'Title', 'Seeds', 'Leechers']
print(*to_aligned_data(data, keys=keys),
      sep='\n')
# 1 Salvation.S02E11.HDTV.x264-KILLERS   262 19
# 2 Salvation.S02E13.WEB.x264-TBS[ettv]  229 25
# 3 Salvation.S02E08.HDTV.x264-KILLERS   178 21
# 4 Salvation.S02E01.HDTV.x264-KILLERS   144 11
# 5 Salvation.S02E09.HDTV.x264-SVA[ettv] 129 14
keys = ['Title', 'Leechers']
print(*to_aligned_data(data, keys=keys),
      sep='\n')
# Salvation.S02E11.HDTV.x264-KILLERS   19
# Salvation.S02E13.WEB.x264-TBS[ettv]  25
# Salvation.S02E08.HDTV.x264-KILLERS   21
# Salvation.S02E01.HDTV.x264-KILLERS   11
# Salvation.S02E09.HDTV.x264-SVA[ettv] 14
keys = ['Key', 'Title', 'Seeds', 'Leechers']
print(*to_aligned_data(data, keys=keys, sep=' ' * 5),
      sep='\n')
# 1     Salvation.S02E11.HDTV.x264-KILLERS       262     19
# 2     Salvation.S02E13.WEB.x264-TBS[ettv]      229     25
# 3     Salvation.S02E08.HDTV.x264-KILLERS       178     21
# 4     Salvation.S02E01.HDTV.x264-KILLERS       144     11
# 5     Salvation.S02E09.HDTV.x264-SVA[ettv]     129     14

See docs for more. There are examples with alignment as well.

Georgy
  • 12,464
  • 7
  • 65
  • 73
  • This looks amazing, and clean. Can I know what the single asterisk argument in the function "values_max_length(dicts: Sequence[Dict[str, Any]],*, key: str) -> int" mean? – Leon N Nov 02 '18 at 04:11
  • Never mind I found it, https://stackoverflow.com/questions/2965271/forced-naming-of-parameters-in-python/14298976#14298976 sorry. – Leon N Nov 02 '18 at 04:14
0

Great answer buy @Jongware, just to

  1. make it a bit more general
  2. without hard-coded items
  3. printing any kind of values, not string only -

here it is:

def print_list_of_dicts_as_table(list_of_dicts, keys=None):
    # assuming all dicts have same keys
    first_entry = list_of_dicts[0]
    if keys is None:
        keys = first_entry.keys()
    num_keys = len(keys)

    max_key_lens = [
        max(len(str(item[k])) for item in list_of_dicts) for k in keys
    ]
    for k_idx, k in enumerate(keys):
        max_key_lens[k_idx] = max(max_key_lens[k_idx], len(k))

    fmtstring = (' | '.join(['{{:{:d}}}'] * num_keys)).format(*max_key_lens)

    print(fmtstring.format(*first_entry.keys()))
    print(fmtstring.format(*['-'*key_len for key_len in max_key_lens]))
    for entry in list_of_dicts:
        print(fmtstring.format(*entry.values()))

Usage example:

a=[{'a':'asdd','b':'asd'},{'a':'a','b':'asdsd'},{'a':1,'b':232323}]
print_list_of_dicts_as_table(a)

Output:

a    | b     
---- | ------
asdd | asd   
a    | asdsd 
   1 | 232323
Emil
  • 629
  • 2
  • 7
  • 24