16

I want to start with an empty data frame and then add to it one row each time. I can even start with a 0 data frame data=pd.DataFrame(np.zeros(shape=(10,2)),column=["a","b"]) and then replace one line each time.

How can I do that?

Donbeo
  • 17,067
  • 37
  • 114
  • 188
  • Is there a reason you have to do it this way? I would recommend building lists with `append` and then converting to a dataframe when you've generated all the data, if possible. It will be a lot quicker and you can always iterate through subsets of the dataframe afterwards in your analysis if you need to operate on slices. – jmz Feb 12 '14 at 09:50
  • I agree, however note that building lists will be slow as lists will periodically need to be grown by creating a new list with sufficient space and copying the contents. Depends on the size of your data, for small sizes it is irrelevant, for large sizes it will matter. It may be better use a dict or numpy array for periodic addition of data and then construct the dataframe from that – EdChum Feb 12 '14 at 10:03
  • I am looking for something of easy and quick to take notes of results during an interactive session. My data frame will have less than rows so speed is non a problem. In R I would use rbind(dataframe,row). So you think I should do d=[]-->d.append([3,4])... – Donbeo Feb 12 '14 at 10:17
  • Use `concat` to add a row see:http://pandas.pydata.org/pandas-docs/stable/merging.html#concatenating-objects – EdChum Feb 12 '14 at 10:31
  • As EdChum says, it doesn't really matter if you're just noting stuff in an interactive session. Our comments really assumed that you were trying to build a dataframe in a loop. I would probably `append` in your situation but that's just habit. So long as the data type works for what you're doing I wouldn't worry too much. – jmz Feb 12 '14 at 11:04
  • A similar question with a detailed answer: https://stackoverflow.com/questions/24036911/how-to-update-values-in-a-specific-row-in-a-python-pandas-dataframe – Anton Tarasenko Dec 06 '18 at 16:30

2 Answers2

13

Use .loc for label based selection, it is important you understand how to slice properly: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label and understand why you should avoid chained assignment: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [14]:

data=pd.DataFrame(np.zeros(shape=(10,2)),columns=["a","b"])
data
Out[14]:
   a  b
0  0  0
1  0  0
2  0  0
3  0  0
4  0  0
5  0  0
6  0  0
7  0  0
8  0  0
9  0  0

[10 rows x 2 columns]
In [15]:

data.loc[2:2,'a':'b']=5,6
data
Out[15]:
   a  b
0  0  0
1  0  0
2  5  6
3  0  0
4  0  0
5  0  0
6  0  0
7  0  0
8  0  0
9  0  0

[10 rows x 2 columns]
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • If you are updating the entire row, there is no need to specify columns, `data.loc[2] = 5,6` should be enough. Observe that if you want to update the entirety of the row with the same value, you could type just `data.loc[2] = 3` and if you provide more values than there are columns, you will get an `ValueError` – Engels Leonhardt Nov 27 '19 at 12:10
1

If you are replacing the entire row then you can just use an index and not need row,column slices. ...

data.loc[2]=5,6