1

First, I apologize if my description is not accurate enough for you, I am a total newbie and I don't know a thing about programming, so don't hesitate to tell me if you need more detailed info, but I will try to be as precise as possible.

So I have downloaded a bunch of tweets thanks to Twitter's API and the Terminal (through Twurl). All the tweets are in a .json file (that I open with TextWrangler, I'm on a Mac) and the thing is that when I export my .json file to a .csv file in order to process and analyze the data more easily thanks to Excel (or at least the Excel version of LibreOffice), I don't have all the parameters I would require for my study, I lack the "bio" part of each Tweet info present in the .json file. In other words, in my final table I have a column for the tweet ID, one for the tweet author, one for the text of the tweet itself and so on... But I don't have a column for the bio of the tweet author, whereas this information is displayed in the .json file itself. So my question is: is there a code or anything which would enable me to have one more column displaying some more info present in the basic .json file in my final .csv table?

Again, this may not be clear, so don't hesitate to tell me if you need me to highlight a specific point.

Thanks in advance for any insight, I really need help on this one, this is for a research project I need to carry on for my PhD, so any help would be more than welcome!

EDIT: As an example, here is a sample of the data I have for one tweet in my original .json file:

{
    "created_at": "Mon Apr 28 09:00:40 +0000 2014",
    "id": 460705144846712800,
    "id_str": "460705144846712832",
    "text": "Work can suck a dick today",
    "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
    "truncated": false,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    "user": {
        "id": 253350311,
        "id_str": "253350311",
        "name": "JEEEZUS",
        "screen_name": "Maxi_Flex",
        "location": "Southchestershire",
        "url": "http://www.soundcloud.com/maxi_flex",
        "description": "Jazz Personality.G Mentality.",
        "protected": false,
        "followers_count": 457,
        "friends_count": 400,
        "listed_count": 1,
        "created_at": "Thu Feb 17 02:08:57 +0000 2011",
        "favourites_count": 1229,
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": true,
        "verified": false,
        "statuses_count": 13661,
        "lang": "en",
        "contributors_enabled": false,
        "is_translator": false,
        "is_translation_enabled": false,
        "profile_background_color": "08ABFC",
        "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/444297891977244672/Z1BkfCFB.jpeg",
        "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/444297891977244672/Z1BkfCFB.jpeg",
        "profile_background_tile": true,
        "profile_image_url": "http://pbs.twimg.com/profile_images/454073282778902529/gCGicDBH_normal.jpeg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/454073282778902529/gCGicDBH_normal.jpeg",
            "profile_banner_url": "https://pbs.twimg.com/profile_banners/253350311/1392339276",
        "profile_link_color": "FA05F2",
        "profile_sidebar_border_color": "FFFFFF",
        "profile_sidebar_fill_color": "DDEEF6",
        "profile_text_color": "333333",
        "profile_use_background_image": true,
        "default_profile": false,
        "default_profile_image": false,
        "following": null,
        "follow_request_sent": null,
        "notifications": null
    },
    "geo": null,
    "coordinates": null,
    "place": null,
    "contributors": null,
    "retweet_count": 0,
    "favorite_count": 0,
    "entities": {
        "hashtags": [],
        "symbols": [],
        "urls": [],
        "user_mentions": []
    },
    "favorited": false,
    "retweeted": false,
    "filter_level": "medium",
    "lang": "en"
}

So in the final csv file, I have some of the info I mentionned above, but what I would need to add in the csv file is the "description" part (bold) of each string. Any help would be appreciated!

ComicSansMS
  • 51,484
  • 14
  • 155
  • 166
Michael Gauthier
  • 135
  • 1
  • 10

2 Answers2

1

Any good JSON to CSV converter will work, try this one. If there is somehting funky in the JSON we need an example of the input JSON and what is getting spit out.

If you just need that one field enter the following command on the command line:

cat test.json | sed -n 's/.*description\":\"\([^"]*\)\".*/Description, \1/p' > result.csv

Where test.json is the file with all the JSON entries in it.

Here is the output from an example I ran:

cat test.json | sed -n 's/.*description\":\"\([^"]*\)\".*/\1/p'
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.

If the file is very large you may need to split in to parts:

split -l N test.json part

Where N is the number of lines per part.

Usman Ismail
  • 17,999
  • 14
  • 83
  • 165
  • Thanks for the link, I didn't know about it, but the thing is that it won't work apparently because the file is too big... The point is that my file is extremely big (about fifty thousand tweets), hence my need to automatically process it through Excel for example... But thanks a lot for the advice, I really appreciate it! – Michael Gauthier Apr 28 '14 at 18:20
1

The problem is probably that JSON is hierarchical and CSV is not. I'm guessing that you are only getting the top level JSON elements and not the nested objects. For example if your JSON is:

{
 'name': 'test',
 'author': {
    'id': 123,
    'created': ''
  }
}

you are only getting 'name' and not 'author.id'? If this is the case, check out other questions on SO related to flattening JSON out for CSV e.g. flattening json to csv format

Community
  • 1
  • 1
pherris
  • 17,195
  • 8
  • 42
  • 58
  • Thanks for the answer! Yes, I am getting the author id. The full list of the parameters I have in my final csv file is: "tweet id", "tweet time", "tweet author", "tweet author id" "tweet language" "tweet geo" and "tweet text". I would need to add something like "author description" to have all the data I need. – Michael Gauthier Apr 28 '14 at 18:24
  • I just took a look at the link you provided me, and thanks by the way, but the thing is that as I said, I am a total newbie, so I don't really understand what is going on in this thread... I don't know most of the technical terms so I am a bit confused... :s – Michael Gauthier Apr 28 '14 at 18:28