1

This is my first question on StackOverflow, so suggestions on how to make the question clearer are always welcome.

I have a CSV of point data structured as shown below:

OBJECTID,CART_ID,SHAPE
1,ABC,"(1.2, -4.5)"
2,ABD,"(3.8, 9.1)"

Using the petl module in python 3.5, I am trying to convert the SHAPE string into two separate columns of float objects. Using what I understood from the petl documentation, I should be able to do it in three steps.

  1. Open the CSV: a = petl.fromcsv('file.csv')
  2. Convert SHAPE from a string to a tuple: b = petl.convert(a, 'SHAPE', tuple)
  3. Split the tuple into two columns using .unpack(): c = petl.unpack(b, 'SHAPE', ['LAT', 'LON']

I believed this would result in a CSV file that looks like this:

OBJECTID,CART_ID,LAT,LON
1,ABC,1.2,-4.5
2,ABD,3.8,9.1

Instead, .convert() produces:

OBJECTID,CART_ID,SHAPE
1,ABC,('(', '1', '.', '2', ',', ' ', '-', '4', '.', '5', ')')

Any help on a) what .convert() is doing, or b) how to restructure the CSV would be appreciated.

Thank you.

Full code:

import petl
a = petl.fromcsv('file.csv')
petl.look(a)
b = petl.convert(a, 'SHAPE', tuple)
petl.look(b)
c = petl.unpack(b, 'SHAPE', ['LAT', 'LON']
petl.look(c)
Dimgold
  • 2,748
  • 5
  • 26
  • 49

3 Answers3

1

You don't have a compound table as petl expects, you still have a string. You need to pass it to a different function in order to interpret it:

>>> ast.literal_eval('(1, 2)')
(1, 2)

Integration of ast.literal_eval() into petl is left as an exercise for the reader.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

As mentioned by Ignacio, you still need to split up the string data. This could either be done using strip() and split(), or as suggested using ast.literal_eval() to safely evaluate the contents of the final column.

The following example makes use of this. It reads your CSV file in, splits the final column into its constituent parts and creates a new output CSV file:

import csv
import ast

with open('input.csv', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    csv_output.writerow(next(csv_input)[:2] + ['LAT', 'LON'])

    for row in csv_input:
        csv_output.writerow(row[:2] + list(ast.literal_eval(row[2])))

Giving you an output.csv looking like:

OBJECTID,CART_ID,LAT,LON
1,ABC,1.2,-4.5
2,ABD,3.8,9.1
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
0

Using PETL, you could just use capture and a regular expression:

b = petl.capture(a, 'SHAPE', r'\(\s*([-0-9.]+)\s*,\s*([-0-9.]+)\s*\)', ['LAT', 'LON'])
c = petl.convert(b, ['LAT', 'LON'], float)

This should detect the numbers and create 2 new columns named LAT and LON, but they would be strings, so you might need to use convert to float after that.

A slightly more "evil" approach would be to realize that the expression is valid python for a tuple with two floats, so you could just say:

b = petl.convert(a, 'SHAPE', eval)
c = petl.unpack(b, 'SHAPE', ['LAT', 'LON'])
swstephe
  • 1,840
  • 11
  • 17