0

I am counting the number of contractions in a certain set of presidential speeches, and want to output these contractions to a CSV or text file. Here's my code:

import urllib2,sys,os,csv
from bs4 import BeautifulSoup,NavigableString
from string import punctuation as p
from multiprocessing import Pool
import re, nltk
import requests
import math, functools
import summarize
reload(sys)

def processURL_short(l):
    open_url = urllib2.urlopen(l).read()
    item_soup = BeautifulSoup(open_url)
    item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'})
    item_str = item_div.text.lower()
    return item_str

every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427',
'http://www.millercenter.org/president/obama/speeches/speech-4424',
'http://www.millercenter.org/president/obama/speeches/speech-4453',
'http://www.millercenter.org/president/obama/speeches/speech-4612',
'http://www.millercenter.org/president/obama/speeches/speech-5502']

data = {}
count = 0
for l in every_link_test:
    content_1 = processURL_short(l)
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1
        splitlink = l.split("/")
        president = splitlink[4]
        speech_num = splitlink[-1]
        filename = "{0}_{1}".format(president,speech_num)
    data[filename] = count
    print count, filename

   with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        a.writerows(data)

Running that for loop prints out

79 obama_speech-4427 101 obama_speech-4424 101 obama_speech-4453 182 obama_speech-4612 224 obama_speech-5502

I want to export that to a text file, where the numbers on the left are one column, and the president/speech number are in the second column. My with statement just writes each individual row to a separate file, which is definitely suboptimal.

blacksite
  • 12,086
  • 10
  • 64
  • 109
  • 2
    If you google `write csv with python` you get plenty of answers, [try this one](http://stackoverflow.com/questions/14693646/writing-to-csv-file-python) – Kyle Pittman Oct 08 '15 at 14:21
  • Yeah, I've seen that. That output to CSV essentially put one letter in each column, and didn't even include the contraction count. – blacksite Oct 08 '15 at 14:30
  • 1
    I would suggest editing this question or creating a new question regarding the code you tried to use to output the CSV - It's simpler for us to help you with the code you've already tried than for us to write you something from scratch. – Kyle Pittman Oct 08 '15 at 14:33
  • The code I tried is at the tail end of the code above. It starts with `with open('contraction_counts.csv'...` – blacksite Oct 08 '15 at 14:46

2 Answers2

1

You can try something like this, this is a generic method, modify as you see fit

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for i in you_data_structure: # eg list or dictionary i'm assuming a list structure
    w.writerow([
      i[0],
      i[1],
    ])

or if a dictionary

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for k, v in your_dictionary.items(): # eg list or dictionary i'm assuming a list structure
    w.writerow([
      k,
      v,
    ])
reticentroot
  • 3,612
  • 2
  • 22
  • 39
  • I still get each letter of `filename` output to individual columns in 'contraction_counts.csv'... I'm trying to get one column for `counts` and another for `filename`. – blacksite Oct 08 '15 at 23:27
  • 1
    Then you'll have to modify your data structure accordly, this snippet shows you how to do it, it's up to you to figure out how to use it. If I had a list of lists for example [[hi, hello], [by, bye]] this code would iterate over that list and I[0] would be hi on the first pass and by on the second pass. – reticentroot Oct 08 '15 at 23:31
  • Ok. It's actually saved as a dictionary (e.g. `{'washington_speech-3446': 8873,'washington_speech-3447': 8874, ...}`, so a bit of modification and I should be good, I suppose. – blacksite Oct 09 '15 at 00:19
  • 1
    okay, added an example of how it would work with a dictionary. – reticentroot Oct 09 '15 at 00:23
  • okay. that definitely makes sense now, me being able to read the code for writing from a dictionary. thanks! – blacksite Oct 09 '15 at 00:34
1

Your problem is that you open the output file inside the loop in w mode, meaning that it is erased on each iteration. You can easily solve it in 2 ways:

  1. mode the open outside of the loop (normal way). You will open the file only once, add a line on each iteration and close it when exiting the with block:

    with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        for l in every_link_test:
            content_1 = processURL_short(l)
            for word in content_1.split():
                word = word.strip(p)
                if word in contractions:
                    count = count + 1
                splitlink = l.split("/")
                president = splitlink[4]
                speech_num = splitlink[-1]
                filename = "{0}_{1}".format(president,speech_num)
            data[filename] = count
            print count, filename
            a.writerows(data)
    
  2. open the file in a (append) mode. On each iteration you reopen the file and write at the end instead of erasing it - this way uses more IO resources because of the open/close, and should be used only if the program can break and you want to be sure that all that was written before the crash has actually been saved to disk

    for l in every_link_test:
        content_1 = processURL_short(l)
        for word in content_1.split():
            word = word.strip(p)
            if word in contractions:
                count = count + 1
            splitlink = l.split("/")
            president = splitlink[4]
            speech_num = splitlink[-1]
            filename = "{0}_{1}".format(president,speech_num)
        data[filename] = count
        print count, filename
    
        with open('contraction_counts.csv','a',newline='') as fp:
            a = csv.writer(fp,delimiter = ',')
            a.writerows(data)
    
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Both of those solutions leave me back where I started - with the output to `contraction_counts.csv` being each letter of `filename` in its own individual column, with no inclusion of the actual contradiction counts. – blacksite Oct 08 '15 at 23:22