0

I have this piece of code

    for play_type in play_codes['general']:
        if play_type in play_tx:
            code = play_codes['general'][play_type]['code']
            break

which references a dictionary 'play_codes' (a portion reproduced below)

play_codes = {
              'general':{
                         'falló en':            {'code':'',   'description':'generic out'},
                         'se ponchó':           {'code':'K',  'description':'strike out'},
                         'por base por bolas':  {'code':'BB', 'description':'walk'}
                        }
             }

looping through the various play_type, and if there's a match assigns 'code' to code (basically an if then elseif loop)

It works beautifully - except if the play_type contains an utf8 extended character, such as a tilde, then I get:

TypeError: 'in <string>' requires string as left operand

after this, I'm going to parse Korean, so it's something I need to master!

  • 2
    What is the Python version you're using and what does `play_type` look like when it goes wrong? – Simeon Visser Nov 29 '14 at 00:57
  • 1
    And what is `play_tx`? – Marcin Nov 29 '14 at 01:28
  • v 2.6 play_tx is a string that has been scraped "M.Brito falló en rolata al lanzador." the offending play_type is "bate├│ rolata" as printed at a dos prompt – Brian L Cartwright Nov 29 '14 at 01:56
  • wrote the output to a MySQL table to make sure the utf8 coding was coming thru correctly "bolas intencional" "M.Brito falló en rolata al lanzador." "bateó rolata" "M.Brito falló en rolata al lanzador." – Brian L Cartwright Nov 29 '14 at 02:00
  • Is this your *actual* code? Are you sure you don't have something like `for play_type in play_codes:` (missing the `['general']`) which would make play_type a `dict` instead of a `str`? – Peter Gibson Nov 29 '14 at 02:40
  • For debugging put a `try`/`except` block around the `if` and print `type(play_type)` to see what actual type Python thinks it is. – Mark Ransom Nov 29 '14 at 04:36
  • Peter Gibson - yes, that is the code. By specifying "for play_type in play_codes['general']" play_type is then a list containing falló en, se ponchó and all the other value inside of ['general']. It loops thru each of these values, comparing them to play_tx, until it finds a match - then sets the variables and exits. Written that way it works like an if/then loop without having to specify the elements, other than in the dictionary. Mark - I didn't put in a try because it was showing me an error message already, but I'll see what it returns. – Brian L Cartwright Nov 29 '14 at 20:42

2 Answers2

0

This code works for me. Note the declaration of the file encoding at the top. That may be your problem (i.e. it's defaulting to ascii).

# -*- coding: utf-8 -*-

play_codes = {
    'general':{
        'falló en':            {'code':'',   'description':'generic out'},
        'se ponchó':           {'code':'K',  'description':'strike out'},
        'por base por bolas':  {'code':'BB', 'description':'walk'}
    }
}

#play_tx = ('falló en', 'se ponchó', 'por base por bolas')
play_tx = "M.Brito falló en rolata al lanzador."

for play_type in play_codes['general']:
    if play_type in play_tx:
        code = play_codes['general'][play_type]['code']
        break

If that turns out to be the problem then you can see here for some more info https://www.python.org/dev/peps/pep-0263/

Hmm this works even when I set the encoding to ascii. So now I'm a little confused. Post all the code.

demented hedgehog
  • 7,007
  • 4
  • 42
  • 49
  • I copied your code and inserted "print code" and it worked fine. I did have a coding comment in the file where the dictionary was defined # coding: utf-8 but even when I switched to the version you used, and included it in both of the python files, I got the same error - but I think this points me in the right direction and I'll continue to test. – Brian L Cartwright Nov 29 '14 at 03:17
  • yeah .. there's something weird going on I suspect. You'll need to post the broken code verbatim if you want people to give you better answers.. The logic looks just fine so it is puzzling. – demented hedgehog Nov 29 '14 at 03:22
  • When both strings being compared are defined in the Python code there is not problem. When play_tx is read from a MySQL table it is a problem. I had this project sitting for a while and now I recall running into something similar with Korean characters, but I thought I solved it a few months ago (I've been looking for the appropriate code fragments. The MySQL table is encoded uft8_general_ci – Brian L Cartwright Nov 29 '14 at 03:50
0

Here's a complete set of code

# -*- coding: utf-8 -*-

def db_connect():
  DBUSER = 'root'
  DBPASSWD = 'xxx'
  DB = 'cuba'

  try:
    db = MySQLdb.connect(user=DBUSER, passwd=DBPASSWD, db=DB, charset='utf8',     cursorclass=MySQLdb.cursors.DictCursor)
    cursor = db.cursor()
  except:
    print 'Cannot connect to database. Check credentials'
    raise SystemExit


def list_games():

  query = """SELECT
               game_id,
               season
             FROM cuba.games
             WHERE game_id <> 0
             ORDER BY game_id ASC"""
  cursor.execute(query)

  gamelist = []

  for rec in cursor.fetchall():
    gamelist.append(rec)

  return(gamelist)


def list_pbp(game_id):

  query = """SELECT
               game_id,
               event_id,
               inning_tx,
               play_tx,
               away_score_ct,
               home_score_ct
             FROM cuba.pbp
             WHERE game_id = %d
             ORDER BY event_id """ % game_id
  cursor.execute(query)

  pbplist = []

  for rec in cursor.fetchall():
    pbplist.append(rec)

  return(pbplist)


def main():

  play_codes = {
              'general':   {
                            'falló en':            {'code':'',   'h_cd':'0','event_cd':'2' ,'description':'generic out'},
                            'se ponchó':           {'code':'K',  'h_cd':'0','event_cd':'3' ,'description':'strike out'},
                            'por base por bolas':  {'code':'BB', 'h_cd':'0','event_cd':'14','description':'walk'},
                            'bolas intencional':   {'code':'IBB','h_cd':'0','event_cd':'15','description':'intentional walk'},
                            'se embasó por error': {'code':'E',  'h_cd':'0','event_cd':'18','description':'error'},
                            'bateó  rolata':       {'code':'FC', 'h_cd':'0','event_cd':'19','description':'fielders choice'},
                            'bateó sencillo':      {'code':'S',  'h_cd':'1','event_cd':'20','description':'single'},
                            'bateó doble':         {'code':'D',  'h_cd':'2','event_cd':'21','description':'double'},
                            'bateó triple':        {'code':'T',  'h_cd':'3','event_cd':'22','description':'triple'},
                            'bateó cuadrangular':  {'code':'HR/','h_cd':'4','event_cd':'23','description':'homerun'}
                           }
               }


  db_connect()
  gamelist = list_games()

  for game in gamelist:

    game_id = game['game_id']

    pbp = list_pbp(game_id)

    for play in pbp:
      play_tx = play['play_tx']

      code = ''

#      play_tx = 'R.Bordon bateó sencillo en rolata al izquierdo.'
      for play_type in play_codes['general']:

          if play_type in play_tx:
              code = play_codes['general'][play_type]['code']
              print code,play_type, play_tx
              break


  db_close()

if __name__ == '__main__':
    main()
  • Take a look at this http://stackoverflow.com/questions/6202726/writing-utf-8-string-to-mysql-with-python – demented hedgehog Nov 29 '14 at 05:51
  • So .. maybe you need to make sure you convert your strings to unicode before you put them in the db (I'm assuming you're running a python2.X version where strings are ascii by default). – demented hedgehog Nov 29 '14 at 05:53
  • what line does it fail on? This one for play in pbp: ? – demented hedgehog Nov 29 '14 at 05:56
  • "Then make sure you are passing unicode objects to your db connection as it will encode it using the charset you passed to the cursor. If you are passing a utf8-encoded string, it will be doubly encoded when it reaches the database." That's possibly the relevant bit of that previous link. – demented hedgehog Nov 29 '14 at 05:57
  • Also try use_unicode=True in your connect as people seem to think is required – demented hedgehog Nov 29 '14 at 05:59
  • Also have a shot at printing all the elements of the pbp list. I reckon your problem is more likely to be with encoding and decoding and configuration of unicode strings to and from the db. So it would be good to eliminate the dictionary stuff from where you're looking – demented hedgehog Nov 29 '14 at 06:03
  • When I compare the dictionary to text defined in a Python statement it works fine in all cases. When the dictionary is compared to a string retrieved from MySQL, it fails the "if play_type in play_tx" if the dictionary item contains an extended character. I used this same connect string when I scraped the data off the web and into MySQL. At this step I'm retrieving the data from MySQL to be parsed into a warehouse. I should be able to put the dictionary into a MySQL table for lookup, but it will be slower. Thanks for the help, I'll keep working on it this weekend and will check back. – Brian L Cartwright Nov 29 '14 at 14:36