2

Hi I'm trying to import CSV data into a spatially enabled Postgres database. The data is available here. I'm unsure where I went wrong and any help is greatly appreciated! What I'm attempting to do is visualize that data with D3.js and maybe display a heat density of the most libraries per town or something to that degree.

File: lib/tasks/import_incidents_csv.rake

require 'csv'

namespace :import_incidents_csv do

  task :create_incidents => :environment do

    csv_text = File.read('/home/mgmacri/data/PublicLibraryBranchLocations.csv')
    csv = CSV.parse(csv_text, :headers => true)

    csv.each do |row|
      row = row.to_hash.with_indifferent_access
      Moulding.create!(row.to_hash.symbolize_keys)
    end

  end

end


user@server:/spatial_project$: rake import_incidents_csv:create_incidents --trace
** Invoke import_incidents_csv:create_incidents (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute import_incidents_csv:create_incidents
rake aborted!
invalid byte sequence in UTF-8
/usr/lib/ruby/1.9.1/csv.rb:1855:in `sub!'
/usr/lib/ruby/1.9.1/csv.rb:1855:in `block in shift'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `loop'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `shift'
/usr/lib/ruby/1.9.1/csv.rb:1791:in `each'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `read'
/usr/lib/ruby/1.9.1/csv.rb:1379:in `parse'
/home/mgmacri/rails/mymap/lib/tasks/import_incidents_csv.rake:8:in `block (2 levels) in                                     
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `call'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `block in execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:166:in `block in         invoke_with_call_chain'
/usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:159:in `invoke_with_call_chain'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:152:in `invoke'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:143:in `invoke_task'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block (2 levels)     in top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block in   top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:110:in `run_with_threads'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:95:in `top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:73:in `block in run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:160:in  `standard_exception_handling'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:70:in `run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/bin/rake:33:in `<top (required)>'
/usr/local/bin/rake:19:in `load'
/usr/local/bin/rake:19:in `<main>'
Tasks: TOP => import_incidents_csv:create_incidents
Matthew Macri
  • 89
  • 2
  • 14

2 Answers2

3

Excel encodes the file into ISO-8859-1 and not in UTF-8. So tell Ruby to open the file with read only in ISO-8859-1

file=File.open("input_file", "r:ISO-8859-1")
Sagar Bommidi
  • 1,409
  • 8
  • 12
  • Excel uses an ANSI codepage by default, not an ISO charset. These aren't quite the same thing. For example, ISO-8859-1 is similar but not identical to cp1252 (otherwise known as Windows-1252). It's best to use the correct codepage rather than guess at a close-enough encoding - or better yet, save your Excel sheets in UTF-8 using OpenOffice and retain your sanity. Note that Excel will use a different codepage on different systems; eg a central european users will probably send you cp1251 text. See http://stackoverflow.com/questions/508558/what-charset-does-microsoft-excel-use-when-saving-files – Craig Ringer Mar 21 '13 at 01:04
  • 1
    See also http://en.wikipedia.org/wiki/Windows-1252 which explains among other things that Windows-1252 isn't even ANSI-standard even though Windows calls it an "ANSI" codepage. – Craig Ringer Mar 21 '13 at 01:10
2

Using postgresql's native CSV import is a couple of orders of magnitude faster than using Ruby's CSV API, and may also avoid the same encoding issues.

For example:

namespace :import_incidents_csv do
  task :create_incidents => :environment do
    ActiveRecord::Base.connection.execute "COPY moulding (name, state, postcode, lat, long) FROM '/home/mgmacri/data/PublicLibraryBranchLocations.csv' DELIMITER ',' CSV;"
  end
end

More info: http://www.postgresql.org/docs/9.2/static/sql-copy.html

Chris Aitchison
  • 4,656
  • 1
  • 27
  • 43