1

EDIT: in my attempt to narrow down my question I may have oversimplified in a way that makes it harder to answer. Let me try again. Assume that the dictionary is:

holder = {'key1':['headline1', 'body1'], 'key2':['headline2', 'body2']}

I'm looking to output that dictionary to a csv file that will represent that information in three columns: the key column, the headline column, and the body column. Trying to do that with the answer noted below failed for the reasons noted below.

Hopefully that's a bit more clear.


I'm scraping some chinese news sites and trying to output the results into a csv file. After scraping, the dictionary is structured:

uniqueID : [headlines, body]

for each story. I'm trying to output to a CSV that ultimately reads:

uniqueID1 / headlines1 / body1
uniqueID2 / headlines2 / body2
uniqueID3 / headlines3 / body3

with each of those in a different column (so basically three columns with as many rows as I have stories).

I tried using the solution from this question but, in addition to flipping the X and Y axis (which I know how to fix), it also broke out each character in each headline/story into a different entry and broke the character encoding. Since I don't know how to fix either of those problems I'm a bit stuck.

If it is helpful or relevant, I'm encoding the characters this way:

head_fixed = str(headline)
    soup = BeautifulSoup(head_fixed, 'lxml')
    good_output = soup.text.decode("unicode-escape").encode("utf-8")

Naturally, I'm also open to the suggestion that the way that I'm structuring the data is wrong.

Thank you for any ideas.

Community
  • 1
  • 1
mweinberg
  • 161
  • 11
  • 1
    Can you show the whole code or at least a http://stackoverflow.com/help/mcve. It's incredibility difficult for anybody to test their solutions or even really understand your problem without it. – Keatinge May 14 '16 at 17:08
  • I'm voting to close this question as off-topic because question is too vague. – martineau May 14 '16 at 17:12
  • Where are the unique IDs coming from? How are you getting the headlines vs the story bodies? Do you really think using `/` a delimiter for your csv file is a good idea - what if the story or body has that character in it? – martineau May 14 '16 at 17:15
  • sorry, clearly over simplified. I edited the question above, hopefully it is a bit more clear. – mweinberg May 14 '16 at 18:41

2 Answers2

0

This is very easy with pandas (you may need to pip install pandas):

import pandas as pd

holder = {'key1':['headline1', 'body1'], 'key2':['headline2', 'body2']}

df = pd.DataFrame(holder)

df.transpose().to_csv('output.csv', header=None)

# output.csv:
# key1  headline1  body1
# key2  headline2  body2
Daniel
  • 2,345
  • 4
  • 19
  • 36
0

I ended up solving this problem by restructuring the data as a list, so:

holder = [[key1, headline1, body1], [key2, headline2, body2]]

Then I just used

with open('output.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(holder)

I am not sure if there is an advantage to using dictionaries, lists, or combinations of the two in this sort of situation. In this case switching to a list seemed to work, although I am a bit curious about the solution suggested by Daniel.

mweinberg
  • 161
  • 11
  • Glad to see you solved it. Just FYI: it's better practice on Stackoverflow to vote up or accept answers that help out, rather than submitting an answer of your own :) Good luck! – Daniel May 14 '16 at 20:23
  • Thanks! As is clear by everything in this question, I'm still learning my way around the site (and python....) – mweinberg May 14 '16 at 22:13
  • It's all good. You should be able to vote up and accept answers: http://www.stackoverflow.com/help/someone-answers – Daniel May 14 '16 at 22:29