1

I am sorry for asking this question, but i already look through but could not find the answer. I am honestly newbie.I am trying to generate a list of whole word from a json csv file. I already created a list of lines, but then i cannot use split() to generate new list containing separate word (later i need to count word occurrence). My input file contains twitter information: twitter data i tried to write simple code:

myfile=open('fileName','r')
words=[]
for line in myfile:
    words.append(line.split())

len(words)=82

I also tried reader=csv.reader(myFile) and reader=csv.DictReader(myFile)

but in all I can get each line, but how to further split the string/line into independent word. Sorry and thank you in advanced.

My data #I change to a different example as maybe last one was bad formatted:

id,flags,expiration,cas,value
493926581610364928,0,0,2635740904247446,"{""contributors"":null,""truncated"":false,""text"":""@xaaronh @blueredandgold If Namco Bandai's One Piece Unlimited World is anything to go by, no local retail release means no eShop either =\\"",""in_reply_to_status_id"":493925918998425600,""id"":493926581610364928,""favorite_count"":0,""source"":""<a href=\""hp://twitter.com\"" rel=\""nofollow\"">Twitter Web Client</a>"",""retweeted"":false,""coordinates"":null,""entities"":{""symbols"":[],""user_mentions"":[{""id"":139852376,""indices"":[0,8],""id_str"":""139852376"",""screen_name"":""xaaronh"",""name"":""Aaron""},{""id"":74393990,""indices"":[9,24],""id_str"":""74393990"",""screen_name"":""blueredandgold"",""name"":""Leigh""}],""hashtags"":[],""urls"":[]},""in_reply_to_screen_name"":""xaaronh"",""in_reply_to_user_id"":139852376,""retweet_count"":0,""id_str"":""493926581610364928"",""favorited"":false,""user"":{""follow_request_sent"":false,""profile_use_background_image"":true,""default_profile_image"":false,""id"":42302246,""profile_background_image_url_hp"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""verified"":false,""profile_text_color"":""333333"",""profile_image_url_https"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""profile_sidebar_fill_color"":""DDEEF6"",""entities"":{""description"":{""urls"":[{""url"":""hp:tttt"",""indices"":[67,89],""expanded_url"":""hp://infernalmonkey.com"",""display_url"":""infernalmonkey.com""}]}},""followers_count"":506,""profile_sidebar_border_color"":""000000"",""id_str"":""42302246"",""profile_background_color"":""1A1B1F"",""listed_count"":22,""is_translation_enabled"":false,""utc_offset"":36000,""statuses_count"":8676,""description"":""I probably tweet about video games and onaholes. Let's be friends! (NSFW)"",""friends_count"":261,""location"":""Sydney, Australia"",""profile_link_color"":""2FC2EF"",""profile_image_url"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""following"":false,""geo_enabled"":false,""profile_banner_url"":""hp://pbs.twimg.com/profile_banners/42302246/1406105444"",""profile_background_image_url"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""screen_name"":""infernal_monkey"",""lang"":""en"",""profile_background_tile"":false,""favourites_count"":2018,""name"":""Lance McGill"",""notifications"":false,""url"":null,""created_at"":""Sun May 24 23:20:25 +0000 2009"",""contributors_enabled"":false,""time_zone"":""Sydney"",""protected"":false,""default_profile"":false,""is_translator"":false},""geo"":null,""in_reply_to_user_id_str"":""139852376"",""lang"":""en"",""_id"":""493926581610364928"",""created_at"":""Tue Jul 29 01:10:48 +0000 2014"",""in_reply_to_status_id_str"":""493925918998425600"",""place"":null,""metadata"":{""iso_language_code"":""en"",""result_type"":""recent""}}"
sweetsours
  • 21
  • 7
  • Could you post the data you are trying to parse in text format? You can edit and update your question to add it. I see you have the image of it, which is better then not having it at all but text is easier to work with. – Igor Mar 30 '16 at 17:38
  • i am sorry for the bad formatting. thank you @Igor – sweetsours Mar 31 '16 at 04:44
  • i don't why everytime i use json.loads(line) it will return error. – sweetsours Mar 31 '16 at 06:42
  • My json-parsing-fu is weak this morning. Looks like there are definitely examples on the web and stack overflow of folks doing similar stuff. Here is one related I think: http://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data – Igor Mar 31 '16 at 14:36
  • Thank you so much @Igor for your previous comment, highlighting me about the json column. After few trials there and here and understanding more about list and dict in python, i finally manage to get word occurance for each of word sitting within the 'text' string. Maybe my code slightly long, but looking forward for better understanding in python. Thank you again – sweetsours Mar 31 '16 at 14:54
  • Glad to help a little. – Igor Mar 31 '16 at 14:55
  • You can post your solution as the answer if you would like. Might be handy if someone else has this problem. – Igor Mar 31 '16 at 14:59

1 Answers1

1

This is not the best solution, just an effort from a noob (me), definitely need further editing for better output. I am using windows OS.

import csv
import json
abc=[]
myList=[]
myDict={}
myFile=open('fileName.csv','r',encoding='utf-8')
myReader=csv.reader(myFile)
header=next(myReader)
for line in myReader:
     abc=json.loads(line[4])
     myDict=abc
     myList.append(myDict['text'])
dct={}
for eachLine in myList:
    item=eachLine.split()
    for one in item:
        if one in dct:
           dct[one]+=1
        else:
           dct[one]=1
finalList=list(dct.items())
finalList.sort()
sweetsours
  • 21
  • 7
  • @Igor please do comment on my first trial. Will edit more to apply regex filtering. – sweetsours Mar 31 '16 at 15:22
  • I would edit this to include your imports and fix the syntax error on line 6 where closing ``)`` is missing. Though even after fixing that it is not working for me. – Igor Mar 31 '16 at 15:38
  • @Igor : I have put some extra lines.Thank you for your comment. And for extra information, I am using windows platform. Cheers! – sweetsours Mar 31 '16 at 16:22
  • I am still getting this error: ``ValueError: Expecting value: line 1 column 1 (char 0)`` – Igor Mar 31 '16 at 17:54
  • I think to solve it the code must be edited to exclude the first line from the file. – sweetsours Apr 01 '16 at 05:13
  • When I remove the header and empty line from ``file.csv``, I am still getting an error: ``ValueError: Invalid \escape: line 1 column 1330 (char 1329)``. Did you actually got this to work? – Igor Apr 01 '16 at 17:29
  • Yes @Igor its working. I can send you the complete program if you happy to comment on it – sweetsours Apr 02 '16 at 05:48
  • It is a good Stack Overflow practice to post the complete answer with working code. If you want to update it with a working version, it would be handy for everyone who stumbles upon this question in the future. – Igor Apr 03 '16 at 20:09
  • Looks like you edited it since last time I tried it. Will give it another shot. – Igor Apr 03 '16 at 20:10