2

I have a pivot table I want to iterate over, to store in a database.

                                           age  weekly_income
category_weekly_income category_age
High income            Middle aged   45.527721   15015.463667
                       Old           70.041456   14998.104486
                       Young         14.995210   15003.750822
Low income             Middle aged   45.548155    1497.228548
                       Old           70.049987    1505.655319
                       Young         15.013538    1501.718198
Middle income          Middle aged   45.516583    6514.830294
                       Old           69.977657    6494.626962
                       Young         15.020688    6487.661554

I've played with reshape, melt, various for loops, syntax stabs in the dark, chains of stacks, unstacks, reset_indexes, etc.. The closest I have got is the syntax:

crosstab[1:2].age

With this I can pull individual value cells, however I then can't get the value of the indexes.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Chris
  • 5,664
  • 6
  • 44
  • 55
  • 1
    what's your expected output? and why do you want to iterate this? Please explain in more details – Anzel Jan 22 '15 at 20:01
  • @Anzel thanks for the clarification request. I happen to be inserting the values into an EAV schema with categorization. I included full details in a [previous question](http://stackoverflow.com/questions/28006366/insert-and-categorize-a-numpy-array-into-a-django-modelled-database-eav-schema) and it became a very broad and unfocused problem. Thanks unutbu for the correction. – Chris Jan 23 '15 at 10:59

1 Answers1

0

You don't need to iterate the dataframe, Pandas has already provided a method to convert dataframe to sql by DataFrame.to_sql(...).

Alternatively, if you want to manually insert data into database, you can use Pandas' to_csv(), for example:

I have a df like this:

df
                     A         B
first second                    
bar   one     0.826425 -1.126757
      two     0.682297  0.875014
baz   one    -1.714757 -0.436622
      two    -0.366858  0.341702
foo   one    -1.068390 -1.074582
      two     0.863934  0.043367
qux   one    -0.510881  0.215230
      two     0.760373  0.274389

# set header=False, and index=True to get the MultiIndex from pivot    
print df.to_csv(header=False, index=True)

bar,one,0.8264252111679552,-1.1267570930327846
bar,two,0.6822970851678805,0.8750144682657339
baz,one,-1.7147570530422946,-0.43662238320911956
baz,two,-0.3668584476904599,0.341701643567155
foo,one,-1.068390451744478,-1.0745823278191735
foo,two,0.8639343368644695,0.043366628502542914
qux,one,-0.5108806384876237,0.21522973766619563
qux,two,0.7603733646419842,0.2743886250125428

This will provide you a nice comma-delimited format which will be easily be used in sql execute query, something like:

data = []
for line in df.to_csv(header=False, index=True).split('\n'):
    if line:
        data.append(tuple(line.split(',')))

data

[('bar', 'one', '0.8264252111679552', '-1.1267570930327846'),
 ('bar', 'two', '0.6822970851678805', '0.8750144682657339'),
 ('baz', 'one', '-1.7147570530422946', '-0.43662238320911956'),
 ('baz', 'two', '-0.3668584476904599', '0.341701643567155'),
 ('foo', 'one', '-1.068390451744478', '-1.0745823278191735'),
 ('foo', 'two', '0.8639343368644695', '0.043366628502542914'),
 ('qux', 'one', '-0.5108806384876237', '0.21522973766619563'),
 ('qux', 'two', '0.7603733646419842', '0.2743886250125428')]

Then it's only the matter of doing an executemany:

...
stmt = "INSERT INTO table (first, second, A, B) VALUES (%s, %s, %s, %s)"
cursor.executemany(stmt, data)
...

Hope this helps.

Anzel
  • 19,825
  • 5
  • 51
  • 52
  • 1
    Huge thanks. I added the parameter `to_csv(..., sep='\t')` in case the labels had commas, and as I'm in Django pulled the values directly from lists using `row = line.split('\t')` then accessing cells using `row[0]` to get income category, `row[1]` to get age, etc. Can't get line breaks into comments but final approach was therefore: `for line in crstab.to_csv(header=False, index=True, sep='\t').split('\n'):` `if line:` `row = line.split('\t')` `age_val = FloatValue(value=row[2], entity_id=entity.id, attribute_id=attrib_age.id, dimension_id=dims_age.get(row[1]))` `age_val.save()` – Chris Jan 23 '15 at 12:12
  • My full code formatted is on [this answer](http://stackoverflow.com/questions/28006366/insert-and-categorize-a-numpy-array-into-a-django-modelled-database-eav-schema/28109707#28109707) – Chris Jan 23 '15 at 12:26
  • @Chris, nice one! Glad you've it sorted – Anzel Jan 23 '15 at 12:52