1

I am reading text file and writing each line to one of two file in form of list of dict. At the end when I check, pickle contains list of dict. But each dict item is same, and that is last inserted item.

Here logic is, if choice is 1, add in rumour list, 2 then add in norumour list, 3 then break.

What is the bug in this code:

fp = open('130615.txt', 'r')
count1 = 1
count2 = 1
result = {}
final_result_rumour = []
final_result_norumour = []
for val in fp.readlines():
    temp = val.split('|')
    try:
        result['script_code'] = temp[0]
        result['company'] = temp[1]
        result['subject'] = temp[2]
        result['date'] = temp[3]
        result['link'] = temp[4]
        result['description'] = get_description(result['link'])
        if 'merge' in temp[2]  or 'buy' in temp[2]  or 'sale' in temp[2]  or 'tie-up' in temp[2]  or 'tie' in temp[2]  or 'acquire' in temp[2]  or 'amalgamation' in temp[2]  or 'purchase' in temp[2]  or 'amalgamate' in temp[2] or 'acquisition' in temp[2]:
            f1 = open('suspected.txt','a')
            print temp[2]
            flag = raw_input("Enter your choice - ")
            if flag == '1':
                #1.write(temp[2]+'\n')
                print "Rumour suspected : ", count1
                count1 += 1
                final_result_rumour.append(result)
                output1 = open('rumours.pkl', 'wb')
                pickle.dump(final_result_rumour, output1)
                output1.close()
            elif flag == '2':
                print "No Rumour suspected : ", count2
                count2 += 1
                final_result_norumour.append(result)
                output2 = open('norumours.pkl', 'wb')
                pickle.dump(final_result_norumour, output2)
                output2.close()
            elif flag == '3':
                confirm = raw_input('You want to proceed ? ')
                if confirm == '1':
                    break
        else:
            print "No Rumour suspected : ", count2
            count2 += 1
            final_result_norumour.append(result)
            output2 = open('norumours.pkl', 'wb')
            pickle.dump(final_result_norumour, output2)
            output2.close()
    except:
        pass
cyclic
  • 217
  • 1
  • 3
  • 11
  • 1
    BTW, `try: ... except: pass` is **not** a good idea. See [Why is “except: pass” a bad programming practice?](http://stackoverflow.com/q/21553327/4014959) for details. – PM 2Ring Sep 08 '15 at 07:40
  • thanks, could you please give better alternative for this scenario? – cyclic Sep 08 '15 at 07:58
  • There's a lot of good info in the answers to the question I linked you to. You need to decide which exceptions you _want_ your script to be able to recover from, and which exceptions it can safely ignore. Other exceptions should be reported to the user, and terminate the script. For this program, you probably want to save the collected data to the pickle file(s) even when an exception occurs, but that's easy enough to do in the exception handling code. – PM 2Ring Sep 08 '15 at 08:35
  • (cont) BTW, I just noticed that you open the 'suspected.txt' file on each iteration of the for loop, but you never use the file, or close it. You probably don't want to do that. :) In its current state, your script is ignoring all errors that occur inside the `try: ... except:` section, even syntax errors. So as a first step, change `except: pass` to `except: raise` and then run your script with typical data and see what exceptions it actually raises. Next, write except blocks that have named exceptions to handle those exceptions explicitly. – PM 2Ring Sep 08 '15 at 08:35

1 Answers1

3

You are adding the same result dictionary to each of the lists, just changing its previous content in each iteration. Try this:

...
try:
    result = {} # create new dict
    result['script_code'] = temp[0]
    ...

Also, it looks like you are pickle-dumping the result files in each iteration, too. You just need to do this once, after the loop.

tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • thanks for pointing this, but does temp not get updated during each for iteration? – cyclic Sep 08 '15 at 07:39
  • I am dumping every time just to loose the content in case if script breaks in between due to bad input – cyclic Sep 08 '15 at 07:40
  • @cyclic Yes, `temp` is updated, so the values you write in the dictionary are different in each iteration. But you are writing those new values in the same dict, _overwriting_ the values you put there before, thus changing _all the references_ to that dict in your result lists. – tobias_k Sep 08 '15 at 07:41
  • 1
    @cyclic: Dumping your data to the pickle files on every loop is a bit inefficient, although it's (probably) not as bad as it could be, since your OS (probably) caches disk writes. A better strategy is to reorganize your code so that you pickle your data after you exit from the for loop, whether the for loop terminate normally or you break out of it. And if you do that, you can save both lists of dicts to the one pickle file, if you like. – PM 2Ring Sep 08 '15 at 07:55