How to identify multiple words and corresponding values from each line in a file ex: "status":"ok"

Question

I'm trying to create a script that essentially will allow me to create a list with specific items from the lines that can be inserted into an SQL DB. I have multiple lines like the following in a text file "addresses.txt":

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]}

For example I want to remove

"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018"

And in the end up with a column list and a value list that can be written to a file_output.txt like:

INSERT INTO ADDRESSES (%s) VALUES (%s)

This is what I have so far

# Writes %s into the file output_data.txt
address_line = """INSERT INTO ADDRESSES (%s) VALUES (%s)"""

# Reads every line from the file messy_data.txt
messy_string = file("addresses.txt").readlines()

cols = messy_string[0].split(",")  #Defines each word in the first line separated by , as a column name
colstr = ','.join(cols) # formatted string that will plug in nicely
output_data = file("output_data.txt", 'w') # Creates the output file: output_data.txt
for r in messy_string[0:]: # loop through everything after first line
    #r = r.replace(':',',')
    #temp_replace = r.translate(None,'"{}[]()')
    #address_list = temp_replace.split(",")
    #address_list = [x.encode('utf-8') for x in address_list]
    vals = r.split(",") # split at ,
    valstr = ','.join(vals) # join with commas for sql
    output_data.write(address_line % (colstr, valstr))  # write to file

output_data.close()

If included some of my out commented attempts, maybe it can help. Also I noticed that when ever I use #address_list = temp_replace.split(","), all of my utf-8 characters is screwed uo, and I do not know why or how to correct this.

UPDATE Looking at this example How can I convert JSON to CSV? I have come up with this code to fix my problem:

# Reads every line from the file coordinates.txt
messy_string = file("coordinates.txt").readlines()

# Reads with the json module
x = json.loads(messy_string

x = json.loads(x)
f = csv.writer(open('test.csv', 'wb+'))

for x in x:
f.writerow([x['status'], 
            x['message'], 
            x['data']['type'], 
            x['data']['addressAccessId'],
            x['data']['municipalityCode'],
            x['data']['municipalityName'],
            x['data']['streetCode'],
            x['data']['streetName'],
            x['data']['streetBuildingIdentifier'],
            x['data']['mailDeliverySublocationIdentifier'],
            x['data']['districtSubDivisionIdentifier'],
            x['data']['postCodeIdentifier'],
            x['data']['districtName'],
            x['data']['presentationString'],
            x['data']['addressSpecificCount'],
            x['data']['validCoordinates'],
            x['data']['geometryWkt'],
            x['data']['x'],
            x['data']['y']])

However, this does not fix my problem, now I get the following error

Traceback (most recent call last):
  File "test2.py", line 10, in <module>
    x = json.loads(messy_string)
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer

Can anyone help? Thanks in advance.

score 2 · Accepted Answer · answered Feb 20 '14 at 07:41

2

Each line looks like valid JSON to me. You can simply evaluate the JSON and select the keys you'd like to keep (like you would with a dictionary)

import json

messy_string = file("addresses.txt").readlines()

for line in messy_string:
  try:
    parsed = json.loads(line)
    column_names = parsed.keys()
    column_values = parsed.values()
    print parsed
  except:
    raise 'Could not parse line'

answered Feb 20 '14 at 07:41

Kartik

9,463
9
48
52

Thank you Kartik. I tried your solution, and Im getting a SyntaxError: invalid syntax when I try to write the column_values to the output file clean_data.write(address_line % (column_values)) Im still quite new to python, so any elaboration is very much appreciated. – Philip Feb 20 '14 at 08:38
`column_values` is a list `%s` works on strings. Try doing `print "%s" % ','.join(column_values)` – Kartik Feb 20 '14 at 19:50
Thanks for your answer. I have only just started learning python last week, and Im not sure where you want me to do the print? Also I get the following error: _"TypeError exceptions must be old-style classes or derived from BaseException not str"_ Im trying to create a script that will allow me to transform the json text into csv text, with the columns that I have selected. Could you possible elaborate a bit more, and perhaps connect the pieces? Thank you in advance. – Philip Feb 24 '14 at 07:36

How to identify multiple words and corresponding values from each line in a file ex: "status":"ok"

1 Answers1