4

Let's say I have an empty dataframe, already set up with columns, but no rows. I'm scraping some data from the web so let's say I need to add an index '2176' to the empty dataframe. How could I automatically add this row to the database when I try to assign it? Is this even pandas purpose or should I use something else?

Cleb
  • 25,102
  • 20
  • 116
  • 151
Hexagon789
  • 315
  • 2
  • 4
  • 16

2 Answers2

10

As an alternative to .loc, you might want to consider at. Using @NickBraunagel's example:

df = pd.DataFrame(columns=['foo1','foo2'])

Then

df.at['2716', 'foo1'] = 10

yields

     foo1 foo2
2716   10  NaN

Timings are quite different:

# @NickBraunagel's solution
%timeit df.loc['2716', 'foo1'] = 10
1000 loops, best of 3: 212 µs per loop

# the at solution
%timeit df.at['2716', 'foo1'] = 10
100000 loops, best of 3: 12.5 µs per loop

If you want to add several column entries at the same time, you can do:

d = {'foo1': 20, 'foo2': 10}
df.at['1234', :] = d

yielding

     foo1 foo2
2716   10  NaN
1234   20   10

However, make sure to always add the same datatype to avoid errors or other undesired effects as explained here.

Cleb
  • 25,102
  • 20
  • 116
  • 151
  • 1
    Good call, assuming you're only updating one value/cell at a time (which works for this example). For reference: https://stackoverflow.com/a/37216587/4245462 – NickBraunagel Dec 30 '17 at 02:59
  • 2
    @NickBraunagel: I guess this assumption is valid as OP was talking about single rows. Thanks for the reference! – Cleb Dec 30 '17 at 03:04
6
import pandas as pd

df = pd.DataFrame(columns=['foo1','foo2'])

df.loc[2176,'foo1'] = 'my_value'

df is then:

        foo1        foo2
2176    my_value    NaN
NickBraunagel
  • 1,559
  • 1
  • 16
  • 30