0

I am trying to extract some data that is being returned in a nested hash from Mailchimp. Here is a trimmed down version of the results I am getting. For each email there are multiple GROUPINGS and for each GROUPING there are multiple GROUPS.

My objective is to get this into a mysql table with a layout like: email_addr, list, grouping1_id, grouping1_name, group1_name, group1_interest, group2_name, group2_interest, grouping2_id, grouping2_name, etc. So there is one row per subscriber with all the grouping and group information.

{"email"=>"dummy@gmail.com", "merges"=>{"EMAIL"=>"dummy@gmail.com", 
"GROUPINGS"=>[{"id"=>1, "name"=>"Grouping One", "groups"=>[{"name"=>"Group One", 
"interested"=>false}, {"name"=>"Group", "interested"=>true}, 
{"name"=>"Group Three", "interested"=>true}]}, {"id"=>2, "name"=>"Grouping Two", 
"groups"=>[{"name"=>"Group Four", "interested"=>false}, 
{"name"=>"Group Five", "interested"=>false}]}]}}

Right now, the code I have below runs and inserts the results of the nested blocks into the table but there is one row for each pass through the groups.each_with_index statement. My approach so far seems overly complicated but I am not sure of how to approach this to process the data correctly.

Any help is appreciated.

UPDATED: I cleaned up the logic a bit and separated the database writes into each level of the hash processing. Now the data is inserted and updated in the database correctly. Although this still feels very inelegant.

def organize_members_subs
  @members_data = @members_subs["data"]
  @members_data.each do |member|
    @email_addr = member["email"]
    @db.query("INSERT INTO db.details
        (email_addr, list)
        VALUES  ('#{@email_addr}', '#{@list}' ) ")

    groupings = member["merges"]["GROUPINGS"]
    groupings.each_with_index do |grouping, index|
      @groupings_name = grouping["name"]
      @groupings_id = grouping["id"]
      @groupings_label = "grp#{index}_"
      @db.query("UPDATE db.details
        SET grouping#{index}_id = '#{@groupings_id}'
          , grouping#{index}_name = '#{@groupings_name}'
        WHERE email_addr = '#{@email_addr}' ")

      groups = member["merges"]["GROUPINGS"][index]["groups"] 
      groups.each_with_index do |group, index|
        @group_name = group["name"]
        @group_interested = group["interested"]
        @db.query("UPDATE db.details
        SET #{@groupings_label}group#{index}_name = '#{@group_name}'
          , #{@groupings_label}group#{index}_int = '#{@group_interested}'
        WHERE email_addr = '#{@email_addr}' ")

        break if index == groups.length
      end
      break if index == groupings.length
    end
  end
end
analyticsPierce
  • 2,979
  • 9
  • 57
  • 81
  • As a start, how about making `@groupingsXname` and `@groupingsX_id` an array of hashes, so you could replace the six assignments with `@groupings[index] = {"name" => grouping["name"], "id" => grouping["id"]}` or maybe just make `grouping` an array of hashes? Also, consider using `:name` and `:id`, rather than `"name"` and `"id"` as keys. Lastly, it would be helpful to put `{"email"=>"email",...` on several lines, with line-continuation characters ``\``, so readers don't have to scroll horizontally to read that line. – Cary Swoveland Jan 03 '14 at 00:50
  • @CarySwoveland Thanks for the advice. I made a few changes and do have the data writing into the database correctly now. It still seems like it can be improved though. What suggestions do you have? If you add an answer I will accept it. – analyticsPierce Jan 03 '14 at 06:37
  • Can you clarify, is the first `{"email" => "email"...` actually referring to an email address? So in real use it would be `{"email" => "andy@example.com"....`? I know you've pared the example down, but can you use dummy email addresses if that's what's actually going on? I'm not familiar with Mailchimp. – Beartech Jan 03 '14 at 06:54
  • @Beartech yes, you are correct. Poor placeholder choice on my part. I added a dummy email address to the example. – analyticsPierce Jan 03 '14 at 07:09
  • So the `{"email"=>"dummy@gmail.com", ` can be stripped away to remove the outer hash? Are the columns constant? i.e. will there be a variable number of Groupings? And a variable number of groups inside groupings? Also, any chance of using Rails for this? – Beartech Jan 03 '14 at 07:20
  • @Beartech yes, we could pare this down to remove the outer hash. The columns are constant. There can be more GROUPINGS and more GROUPS. Unfortunately, Rails is not an option for this. – analyticsPierce Jan 03 '14 at 16:12
  • Where is "@list" coming from in your example? – Beartech Jan 03 '14 at 16:14
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/44436/discussion-between-analyticspierce-and-beartech) – analyticsPierce Jan 03 '14 at 16:29
  • I'll see if I can make any further suggestions. – Cary Swoveland Jan 03 '14 at 19:39
  • Please do not feel you have to honor your offer to accept my answer were I to provide one. I hope my answer helps, but @Beartooth has given a very thoughtful and interesting answer, drawing on his knowledge of both Rails and databases, neither of which are specialties of mine. – Cary Swoveland Jan 03 '14 at 22:38
  • I'll admit I'm biased for my answer, but I aimed to give you more 'elegant" and ruby-like code. This code isn't based on Rails, symbols are preferred in Ruby too, but it could work well in Rails. Going with a class rather than just a method gives you far greater flexibility now and in the future. Also I tried to leverage variable scope to reduce the overly complex `if` statements. The instance methods in the class make it easier to make changes in the future. Methods should really do just one thing. – Beartech Jan 04 '14 at 21:16
  • Both of these answers are fantastic. I'm going through them both and learning a lot from each. Thank you both for your insight. – analyticsPierce Jan 05 '14 at 04:03
  • It is unfortunate that I have to pick an answer, both of these are correct and valuable. I think @Beartechs answer is more maintainable in the long-term and certainly in Rails. – analyticsPierce Jan 05 '14 at 05:25
  • I will give the answer to @CarySwoveland because his first comment solved my immediate problem and his answer also made many improvements. Thanks to both of you. If you come through Denver I'll buy you a beer. – analyticsPierce Jan 05 '14 at 05:26

2 Answers2

1

First, maybe extra, but I like to work with symbols since I do a lot of my work in Rails. So let's steal a method from here: How do I convert a Ruby hash so that all of its keys are symbols?

def recursive_symbolize_keys(h)
  case h
    when Hash
      Hash[
          h.map do |k, v|
            [ k.respond_to?(:to_sym) ? k.to_sym : k, recursive_symbolize_keys(v) ]
          end
      ]
    when Enumerable
      h.map { |v| recursive_symbolize_keys(v) }
    else
      h
  end
end

OK, lets build a class to make this easier to manipulate and extend as our needs change:

class MemberSub
  attr_accessor :email, :groupings, :data_hash, :list, :data_row, :db_sql

  def initialize(data_hash)

    #convert all keys to symbols
    @data_hash = recursive_symbolize_keys(data_hash)
    @email = @data_hash[:email]
    @list = 'Members'
    @groupings = @data_hash[:merges][:GROUPINGS]
    @data_row  = data_row
    @db_sql = db_insert
  end

  def data_row
    #returns a data row for DB
    row_hash = {}
    row_hash['email'] = @email
    row_hash['list'] = @list
    gc = 1
    #iterate through groupings
    @groupings.each_with_index do |grouping, index|
      row_hash["grouping#{index + 1}_id"] = grouping[:id]
      row_hash["grouping#{index + 1}_name"] = grouping[:name]
      #iterate through the groups
      grouping[:groups].each do |group|
        row_hash["group#{gc}_name"] = group[:name]
        row_hash["group#{gc}_interest"] = group[:interested]
        gc += 1
      end
    end
    row_hash
  end

  def db_insert
    "INSERT INTO db.details (#{@data_row.keys}) VALUES (#{@data_row.values})".tr('[]','')
  end  

end

Now you can do feed it a row using what ever iteration method and make a new object:

row = MemberSub.new({"email"=>"dummy@gmail.com", "list"=>"Members", "merges"=>  
{"EMAIL"=>"dummy@gmail.com", "GROUPINGS"=>[{"id"=>1, "name"=>"Grouping One", "groups"=>  
[{"name"=>"Group One", "interested"=>false}, {"name"=>"Group Two", "interested"=>true},  
{"name"=>"Group Three", "interested"=>true}]}, {"id"=>2, "name"=>"Grouping Two", "groups"=>  
[{"name"=>"Group Four", "interested"=>false}, {"name"=>"Group Five", "interested"=>false}]}]}})

and make a query:

db.query(row.db_sql)

db.query(INSERT INTO db.details ("email", "list", "grouping1_id", "grouping1_name", 
"group1_name", "group1_interest", "group2_name", "group2_interest", "group3_name", 
"group3_interest", "grouping2_id", "grouping2_name", "group4_name", "group4_interest",
 "group5_name", "group5_interest") VALUES ("dummy@gmail.com", "Members", 1, "Grouping One",
 "Group One", false, "Group Two", true, "Group Three", true, 2, "Grouping Two", "Group Four",
 false, "Group Five", false))

The other methods should be self explanatory. You don't have to have them all available as attar_accessor but I just did that for example.

Community
  • 1
  • 1
Beartech
  • 6,173
  • 1
  • 18
  • 41
  • If you want to leave out the `recursive_symbolize_keys` method, just replace the symbols in the rest of the code with quoted strings. – Beartech Jan 03 '14 at 20:41
1

To start, I wanted to take a closer at look your hash. Rather than reformatting it myself, I did this:

require "awesome_print"
h = `{"email"=>..., "interested"=>false}]}]}}`
ap h

Scroll down to the bottom of my answer to see ap's formatting of the hash.

I will answer your question assuming the db structure is a given, but would like to make a few points:

  • If "id" is unique for each grouping record, could you make that the key, and dispense with index?
  • If "name" is unique for each grouping record, could you dispense with both "id" and index?
  • If "name" is unique for each group record (for a given grouping), could you just have group["name"] => group["interested"] for each group?

Moving on to your code, I will also assume the structure of your hash is given. Later, I will revisit that assumption.

The changes I propose to your code are fairly minor and some are purely stylistic:

  • Make all instance variables local variables, meaning that two additional arguments must be passed to def organize_members_subs.
  • With two possible exceptions, eliminate local variables that are only used once after being defined. For example, rather than groupings_id = grouping["id"], then SET grouping#{index}_id = '#{@groupings_id}', just have SET grouping#{index}_id = '#{grouping["id"]}'. The two possible exceptions are groupings and groups. For example, you could get rid of the former by writing member["merges"]["GROUPINGS"].each_with_index do |grouping, index_1|. I'd keep them as variables (so I could easily check their values), but that's a stylistic decision.
  • The variable index in groupings.each_with_index do |grouping, index| is in scope within the inner block, which uses an iterator variable with the same name. I presume that the latter takes precedence, but they should be named differently. I've changed them to index_out and index_in, respectively.
  • index_out ranges from 0 to groupings.length-1, so break if index_out == groupings.length will never be executed, and therefore may be removed. Ditto for break if index_in == groups.length.
  • I moved groupings_label = "grp#{index}_" down to draw attention to the fact that it is needed only later, not in the preceding SET expression.

These changes result in the following:

    def organize_members_subs(db, list, @members_subs["data"])
      members_data.each do |member|
        email_addr = member["email"]
        db.query("INSERT INTO db.details
            (email_addr, list)
            VALUES  ('#{email_addr}', '#{list}' ) ")
        groupings = member["merges"]["GROUPINGS"]
        groupings.each_with_index do |grouping, index_out|
          db.query("UPDATE db.details
          SET grouping#{index_out}_id   = '#{grouping["id"]}'
            , grouping#{index_out}_name = '#{grouping["name"]}'
          WHERE email_addr = '#{email_addr}' ")
          groupings_label = "grp#{index_out}_"

          groups = member["merges"]["GROUPINGS"][index_out]["groups"] 
          groups.each_with_index do |group, index_in|
            db.query("UPDATE db.details
            SET #{groupings_label}group#{index_in}_name = '#{group["name"]}'
              , #{groupings_label}group#{index_in}_int = '#{group["interested"]}'
            WHERE email_addr = '#{email_addr}' ")   
          end
        end
      end
    end

Looking at your hash, I am wondering if you could simplify it to the following (formatting courtesy of awesome print):

  {
      "email" => "dummy@gmail.com",
     "merges" => {
             "EMAIL" => "dummy@gmail.com",
         "GROUPINGS" => {
             1 => {
                   "name" => "Grouping One",
                 "groups" => {
                       "Group One" => false,
                       "Group Two" => true,
                     "Group Three" => true
                 }
             },
             2 => {
                   "name" => "Grouping Two",
                 "groups" => {
                     "Group Four" => false,
                     "Group Five" => false
                 }
             }
         }
     }
  }

or even

{
     "email" => "dummy@gmail.com",
    "merges" => {
            "EMAIL" => "dummy@gmail.com",
        "GROUPINGS" => {
            "Grouping One" => {
                  "Group One" => false,
                  "Group Two" => true,
                "Group Three" => true
            },
            "Grouping Two" => {
                "Group Four" => false,
                "Group Five" => false
            }
        }
    }
}

These are not so much as suggestions, but just food for thought.

Awesome print applied to your hash:

ap h # =>
    {
         "email" => "dummy@gmail.com",
        "merges" => {
                "EMAIL" => "dummy@gmail.com",
            "GROUPINGS" => [
                [0] {
                        "id" => 1,
                      "name" => "Grouping One",
                    "groups" => [
                        [0] {
                                  "name" => "Group One",
                            "interested" => false
                        },
                        [1] {
                                  "name" => "Group",
                            "interested" => true
                        },
                        [2] {
                                  "name" => "Group Three",
                            "interested" => true
                        }
                    ]
                },
                [1] {
                        "id" => 2,
                      "name" => "Grouping Two",
                    "groups" => [
                        [0] {
                                  "name" => "Group Four",
                            "interested" => false
                        },
                        [1] {
                                  "name" => "Group Five",
                            "interested" => false
                        }
                    ]
                }
            ]
        }
    }
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100