EDIT: a coworker was adding quotationmarks before the strings in the print command for the database insert later. That caused the problem with linux for what reason whatsoever ;).
I'm running a script for the conversion of json to csv as preprocess to a database insertion. Typical input is like this:
{"created_at":"Tue Jan 29 15:08:37 +0000 2013","id":296273650116612096,"id_str":"296273650116612096","text":"#BronxBeerHall #GrandOpening Friday 2\/1\/13 11:00am\n2344 #ArthurAvenue, #Bronx 10458 http:\/\/t.co\/1eov0j40","source":"web","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":361449701,"id_str":"361449701","name":"The Bronx Calendar","screen_name":"TheBXCalendar","location":"Bronx NY","url":"http:\/\/www.thebronxcalendar.com","description":"The Bronx Calendar is your first stop for what\u2019s happening and where you need to be in this Beautiful Borough. ","protected":false,"followers_count":58,"friends_count":153,"listed_count":1,"created_at":"Wed Aug 24 20:12:25 +0000 2011","favourites_count":0,"utc_offset":-18000,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"verified":false,"statuses_count":76,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"022238","profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/319019365\/AHLB3023.jpg","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/319019365\/AHLB3023.jpg","profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/2962359572\/523aa999403a3239d0478120f33bec75_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/2962359572\/523aa999403a3239d0478120f33bec75_normal.jpeg","profile_link_color":"0084B4","profile_sidebar_border_color":"E36607","profile_sidebar_fill_color":"F0B371","profile_text_color":"333333","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":{"id":"27485069891a7938","url":"http:\/\/api.twitter.com\/1\/geo\/id\/27485069891a7938.json","place_type":"city","name":"New York","full_name":"New York, NY","country_code":"US","country":"United States","bounding_box":{"type":"Polygon","coordinates":[[[-74.259090,40.477399],[-74.259090,40.917577],[-73.700272,40.917577],[-73.700272,40.477399]]]},"attributes":{}},"contributors":null,"retweet_count":0,**"entities":{"hashtags":[{"text":"BronxBeerHall","indices":[0,14]},{"text":"GrandOpening","indices":[15,28]},{"text":"ArthurAvenue","indices":[56,69]},{"text":"Bronx","indices":[71,77]}]**,"urls":[{"url":"http:\/\/t.co\/1eov0j40","expanded_url":"http:\/\/thebronxnyc.tumblr.com\/post\/41721636567","display_url":"thebronxnyc.tumblr.com\/post\/417216365\u2026","indices":[84,104]}],"user_mentions":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"lang":"en"}
The input are twitter messages.the important part is this:
"entities":{"hashtags":[{"text":"BronxBeerHall","indices":[0,14]},{"text":"GrandOpening","indices":[15,28]},{"text":"ArthurAvenue","indices":[56,69]},{"text":"Bronx","indices":[71,77]}]
One function takes care of the hashtags in the tweet. The aim is to get all hashtags comma seperated in one column. I'm using the build-in json modul at the moment.Maybe switchin back to simplejson later again when this problem is solved
def hashtag_return():
hashtag_comma_sep=""
count_hashtag= (str(tw_hashtags).count("u'indices"))
for i in range (count_hashtag):
hashtag=tw_hashtags[i]["text"]
hashtag_comma_sep += hashtag+","
return hashtag_comma_sep[:-1]
The function is working fine on my laptop but not on our server. The result for this function is always an empty string (edit: on the server side). It's working fine on my laptop
laptop: win7, python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
server: linux,Python 2.7.3 (default, Sep 26 2013, 20:03:06) [GCC 4.6.3] on linux2
The JSON strings are in saved in txt files. One line = one tweet. Prior to that I read the lines for the file.
line=json.loads(line)
tw_hashtags=line["entities"]["hashtags"]
if (tw_hashtags== []):
tw_hashtags=""
else:
tw_hashtags=hashtag_return()