1

I have some lat/long coordinates that i've imported from a CSV file, but for some reason the first latitude value in the file when converted to a float turns to a 0, and Im not sure what to do, everything says the encoding is utf8 when i test the values

"34.1760212".to_f should not be converting to 0

sample data

34.1760212,-116.9995880
34.1521587,-116.9844818
34.1697721,-116.9020844
34.1657952,-116.8505859
34.1521587,-116.8100739
34.1709084,-116.7949677
34.1879499,-116.8402863
34.1964693,-116.8759918
34.1947655,-116.9178772
34.1862459,-116.9508362
34.1805656,-116.9783020
34.1763052,-116.9983006
>> p interest.geofence.first
#=> ["34.1760212", "-116.9995880"]

>> p interest.geofence.first.first
"34.1760212"
=> "34.1760212"

>> p interest.geofence.first.first.encoding
#<Encoding:UTF-8>
=> #<Encoding:UTF-8>

>> p interest.geofence.first.first.to_f
0.0
=> 0.0

>> p interest.geofence.first.last
"-116.9995880"
=> "-116.9995880"

>> p interest.geofence.first.last.encoding
#<Encoding:UTF-8>
=> #<Encoding:UTF-8>

>> p interest.geofence.first.last.to_f
-116.999588
=> -116.999588

>> p '34.1760212'.bytes
[51, 52, 46, 49, 55, 54, 48, 50, 49, 50]
=> [51, 52, 46, 49, 55, 54, 48, 50, 49, 50]

>> p interest.geofence.first.first.bytes
[239, 187, 191, 51, 52, 46, 49, 55, 54, 48, 50, 49, 50]
=> [239, 187, 191, 51, 52, 46, 49, 55, 54, 48, 50, 49, 50]
>> 

>> p interest.geofence.first.first.class
String
=> String

>> p interest.geofence.first.first.method(:to_f)
#<Method: String#to_f()>
=> #<Method: String#to_f()>

>> Marshal.dump(interest.geofence.first.first)
=> "\x04\bI\"\x12\xEF\xBB\xBF34.1760212\x06:\x06ET"

>> Marshal.dump('34.1760212')
=> "\x04\bI\"\x0F34.1760212\x06:\x06ET"

-- i've also tried using BidDecimal without success, hard coded string works but I need it to be from my imported value...

>> p BigDecimal('0')
0.0
=> 0.0
>> p BigDecimal('0.9')
0.9e0
=> 0.9e0
>> p BigDecimal(interest.geofence.first.first)
ArgumentError: invalid value for BigDecimal(): "34.1760212"
    (ripl):36:in `BigDecimal'
    (ripl):36:in `<main>'
>> p BigDecimal('34.1760212')
0.341760212e2
=> 0.341760212e2

... more relevant details: here is the Mongoid model its being saved under

class Interest
  include Mongoid::Document
  include Mongoid::Timestamps

  # ...

  field :name, type: String
  field :geofence, type: Array
end

and the import script

# ...
interest = FR::Interest.new
interest.name = place
interest.geofence = []

log :debug, tn, "importing #{filename}"
File.foreach("#{APP_ROOT}/tmp/geofences/#{filename}") do |line|
  csv_data = CSV.parse(line, col_sep: ',', headers: false)
  row = csv_data[0]
  lat = row[0]
  lon = row[1]

  interest.geofence << [lat, lon]
end

interest.save!
alilland
  • 2,039
  • 1
  • 21
  • 42
  • 1) how do you turn that CSV data into a `interest.geofence` object? That part is missing. 2) what is the `class` of those values? Maybe they just _look_ like strings. – Stefan Jun 28 '22 at 04:24
  • its imported into mongodb (mongoid/activerecord) as an array of strings – alilland Jun 28 '22 at 04:25
  • In Ruby, `"34.1760212".to_f` returns `34.1760212` so something else must be going on. What does `interest.geofence.first.first.class` return? – Stefan Jun 28 '22 at 04:27
  • returns a `String` – alilland Jun 28 '22 at 04:32
  • What does `interest.geofence.first.first.method(:to_f)` return? – Stefan Jun 28 '22 at 04:35
  • Let’s dig a little deeper. What does `Marshal.dump(interest.geofence.first.first)` return? – Stefan Jun 28 '22 at 04:37
  • `"\x04\bI\"\x12\xEF\xBB\xBF34.1760212\x06:\x06ET"` – alilland Jun 28 '22 at 04:38
  • 3
    Oh, it’s a much more common reason. Your string starts with a [UTF-8 BOM](https://en.m.wikipedia.org/wiki/Byte_order_mark), probably from parsing the CSV. – Stefan Jun 28 '22 at 04:42
  • 1
    now THAT makes sense! i'll do some hunting for how to strip it -- feel free to add it as the answer and i'll confirm in chat or on post – alilland Jun 28 '22 at 04:44
  • 1
    Make sure to read the CSV file as explained here: https://stackoverflow.com/questions/5011504/is-there-a-way-to-remove-the-bom-from-a-utf-8-encoded-file – Stefan Jun 28 '22 at 04:44
  • `File.foreach("#{APP_ROOT}/tmp/geofences/#{filename}", encoding: 'bom|utf-8')` fixed the issue, you are welcome to add it as the answer – alilland Jun 28 '22 at 04:54

1 Answers1

0

@stefan figured out the issue in the comments section, so all credit goes to him

The issue came to light when running Marshal.dump

>> Marshal.dump(interest.geofence.first.first)
=> "\x04\bI\"\x12\xEF\xBB\xBF34.1760212\x06:\x06ET"

specifically the characters \xEF\xBB\xBF reveal that there is a UTF-8 BOM in the beginning of the string, which only would have arrived there from importing my text file

to resolve, I just had to re-import my data with some encoding instructions encoding: 'bom|utf-8'

File.foreach("#{APP_ROOT}/tmp/geofences/#{filename}", encoding: 'bom|utf-8') do |line|
  ## ...
end
alilland
  • 2,039
  • 1
  • 21
  • 42