Let's say I have an empty dataframe, already set up with columns, but no rows. I'm scraping some data from the web so let's say I need to add an index '2176'
to the empty dataframe. How could I automatically add this row to the database when I try to assign it? Is this even pandas purpose or should I use something else?
Asked
Active
Viewed 1.1k times
4

Cleb
- 25,102
- 20
- 116
- 151

Hexagon789
- 315
- 2
- 4
- 16
2 Answers
10
As an alternative to .loc
, you might want to consider at
. Using @NickBraunagel's example:
df = pd.DataFrame(columns=['foo1','foo2'])
Then
df.at['2716', 'foo1'] = 10
yields
foo1 foo2
2716 10 NaN
Timings are quite different:
# @NickBraunagel's solution
%timeit df.loc['2716', 'foo1'] = 10
1000 loops, best of 3: 212 µs per loop
# the at solution
%timeit df.at['2716', 'foo1'] = 10
100000 loops, best of 3: 12.5 µs per loop
If you want to add several column entries at the same time, you can do:
d = {'foo1': 20, 'foo2': 10}
df.at['1234', :] = d
yielding
foo1 foo2
2716 10 NaN
1234 20 10
However, make sure to always add the same datatype to avoid errors or other undesired effects as explained here.

Cleb
- 25,102
- 20
- 116
- 151
-
1Good call, assuming you're only updating one value/cell at a time (which works for this example). For reference: https://stackoverflow.com/a/37216587/4245462 – NickBraunagel Dec 30 '17 at 02:59
-
2@NickBraunagel: I guess this assumption is valid as OP was talking about single rows. Thanks for the reference! – Cleb Dec 30 '17 at 03:04
6
import pandas as pd
df = pd.DataFrame(columns=['foo1','foo2'])
df.loc[2176,'foo1'] = 'my_value'
df is then:
foo1 foo2
2176 my_value NaN

NickBraunagel
- 1,559
- 1
- 16
- 30
-
more details: https://github.com/pandas-dev/pandas/issues/2801#issuecomment-17644076 – Tomek C. Apr 29 '21 at 15:47