5

Ok , let say that I have a string text file named "string.txt" , and I want to convert it into a json text file. What I suppose to do? I have tried to use 'json.loads()' ,but it never works with me! here is a part from my text file :

rdian","id":"161428670566653"},{"category":"Retail and consumer merchandise","category_list":[{"id":"187937741228885","name":"Electronics Store"},{"id":"191969860827280","name":"Photographic Services & Equipment"}

any help please? edit: I have use this code:

import json

f = open("string.txt", 'w')
f1 = open("stringJson.txt", 'r')


f.write(json.dumps(json.loads(f), indent=1))


f.close()

the error is like this:

obj, end = self.raw_decode(s, idx=_w(s, 0).end()) TypeError: expected string or buffer

enter image description here

arze ramade
  • 317
  • 2
  • 4
  • 16
  • That certainly looks like a fragment of JSON. What's the error you get when you try to load it? – babbageclunk Jan 03 '14 at 10:11
  • 1
    For one, you have two double-quotes before string.txt => `f = open(""string.txt", 'w')`; Second, your JSON snippet (first code block) is truncated/missing some of it. –  Jan 03 '14 at 10:18
  • @Nicarus: yea it is just a typo when I copied my code here! – arze ramade Jan 03 '14 at 10:20
  • OK. How about the JSON? It looks like the first part of it is cut off. –  Jan 03 '14 at 10:20
  • @Nicarus: I know that becasue it is very large it just give you a snapshot of it ! – arze ramade Jan 03 '14 at 10:21
  • OK - so you know for sure the text is valid JSON? –  Jan 03 '14 at 10:21
  • 1
    Googled your error - this was the first result. See if it helps you. http://stackoverflow.com/questions/12326370/various-errors-while-parsing-json-in-python –  Jan 03 '14 at 10:22
  • no because it is not in json format ! – arze ramade Jan 03 '14 at 10:23
  • it is like this line after line {"data":[{"category":"University","category_list":[{"id":"108051929285833","name":"College & – arze ramade Jan 03 '14 at 10:24
  • 1
    @arzeramade Take it easy with the tone - just trying to better understand what you are asking. –  Jan 03 '14 at 10:27
  • 1
    Does the file contain line breaks? Post a few complete lines in that case. – Janne Karila Jan 03 '14 at 10:28
  • @Nicarus I have included a snapshot of my file – arze ramade Jan 03 '14 at 10:37
  • 1
    That snapshot doesn't help because it could be anywhere in the file where it would be out of format for JSON and therefore not load properly or at all. –  Jan 03 '14 at 10:39
  • You may have multiple JSON objects in the file. In that case see [Retrieving JSON objects from a text file (using Python)](http://stackoverflow.com/q/8730119/222914) – Janne Karila Jan 03 '14 at 10:48

2 Answers2

10
import json
with open("string.txt", "rb") as fin:
    content = json.load(fin)
with open("stringJson.txt", "wb") as fout:
    json.dump(content, fout, indent=1)

See http://docs.python.org/2/library/json.html#basic-usage

igrinis
  • 12,398
  • 20
  • 45
Shaung
  • 508
  • 4
  • 11
  • Extra data: line 1 column 102663 - line 1 column 106497 (char 102662 - 106496) this is error – arze ramade Jan 03 '14 at 10:30
  • 1
    @arzeramade It seems pretty obvious from the errors that the text file is not completely in the appropriate format to be loaded as JSON. You should review the file - fix the issue and retry Shaung's solution. –  Jan 03 '14 at 10:40
  • @arzeramade What is there in the indicated range? Use eg. this to check: `print open("string.txt","rb").read()[102662:106497]` – Janne Karila Jan 03 '14 at 10:42
  • @Nicarus: this what I was telling you , do I have to do this manually?in order to be appropriate ti loaded as json? – arze ramade Jan 03 '14 at 10:44
  • @Nicarus: yea I know that but how to make a text to be valid for json loads? – arze ramade Jan 03 '14 at 10:48
  • @JanneKarila: it print large chunks of data exists on the "string.txt" file – arze ramade Jan 03 '14 at 10:49
  • i have use this to get the file s= urllib2.urlopen("https://graph.facebook.com/search?limit=5000&type=page&q=%26&access_token="+accesstoken+"&__after_id=139433456868").read() – arze ramade Jan 03 '14 at 10:52
  • and the I have store it like this : f.write(s) I have tried to store here as json but it never works – arze ramade Jan 03 '14 at 10:53
  • @arzeramade I don't know what to tell you. I think it was suggested to check if it contained multiple JSON objects - try that. –  Jan 03 '14 at 10:53
  • @arzeramade Try another query that gives a small result. That would be much easier to troubleshoot. – Janne Karila Jan 03 '14 at 10:59
0

It really depends on how your txt file is structured. But suppose you have a structured txt file like the following:

BASE|30-06-2008|2007|2|projected
BASE|30-06-2007|2010|1|projected
BASE|30-06-2007|2009|3|projected
BASE|30-06-2007|2020|2|projected
...

You could use a script like this:

import codecs
import json

import numpy as np
import pandas as pd

raw_filepath = "your_data.txt"

field_names = [
    "Scenario",
    "Date",
    "Year",
    "Quarter",
    "Value"
]

data_array = np.genfromtxt(raw_filepath, delimiter="|", dtype=None, encoding="utf-8")
df = pd.DataFrame.from_records(data_array)
df.columns = field_names
result = df.to_json(orient="records")
parsed = json.loads(result)
out_json_path = "your_data.json"

### saves pandas dataframe in .json format
json.dump(
    parsed, codecs.open(out_json_path, "w", encoding="utf-8"), sort_keys=False, indent=4
)

Explanation

To load a dataset in Numpy, we can use the genfromtxt() function. We can specify data file name, delimiter (which is optional but often used), and number of rows to skip if we have a header row. The genfromtxt() function has a parameter called dtype for specifying data types of each column(this parameter is optional). Without specifying the types, all types will be casted the same to the more general/precise type and numpy will try infer the type of a column.

In this part df.to_json(orient="records") we are Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding. This, way, we can have an output like this, as described in the Pandas Documentation:

>>>result = df.to_json(orient="records")
>>>parsed = json.loads(result)
>>>json.dumps(parsed, indent=4)  
[
    {
        "col 1": "a",
        "col 2": "b"
    },
    {
        "col 1": "c",
        "col 2": "d"
    }
]
Nayanexx.py
  • 121
  • 1
  • 5