replace rows in a pandas data frame

Question

I want to start with an empty data frame and then add to it one row each time. I can even start with a 0 data frame data=pd.DataFrame(np.zeros(shape=(10,2)),column=["a","b"]) and then replace one line each time.

How can I do that?

Is there a reason you have to do it this way? I would recommend building lists with `append` and then converting to a dataframe when you've generated all the data, if possible. It will be a lot quicker and you can always iterate through subsets of the dataframe afterwards in your analysis if you need to operate on slices. — jmz, Feb 12 '14 at 09:50
I agree, however note that building lists will be slow as lists will periodically need to be grown by creating a new list with sufficient space and copying the contents. Depends on the size of your data, for small sizes it is irrelevant, for large sizes it will matter. It may be better use a dict or numpy array for periodic addition of data and then construct the dataframe from that — EdChum, Feb 12 '14 at 10:03
I am looking for something of easy and quick to take notes of results during an interactive session. My data frame will have less than rows so speed is non a problem. In R I would use rbind(dataframe,row). So you think I should do d=[]-->d.append([3,4])... — Donbeo, Feb 12 '14 at 10:17
Use `concat` to add a row see:http://pandas.pydata.org/pandas-docs/stable/merging.html#concatenating-objects — EdChum, Feb 12 '14 at 10:31
As EdChum says, it doesn't really matter if you're just noting stuff in an interactive session. Our comments really assumed that you were trying to build a dataframe in a loop. I would probably `append` in your situation but that's just habit. So long as the data type works for what you're doing I wouldn't worry too much. — jmz, Feb 12 '14 at 11:04
A similar question with a detailed answer: https://stackoverflow.com/questions/24036911/how-to-update-values-in-a-specific-row-in-a-python-pandas-dataframe — Anton Tarasenko, Dec 06 '18 at 16:30

score 13 · Answer 1 · answered Feb 12 '14 at 09:45

Use .loc for label based selection, it is important you understand how to slice properly: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label and understand why you should avoid chained assignment: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [14]:

data=pd.DataFrame(np.zeros(shape=(10,2)),columns=["a","b"])
data
Out[14]:
   a  b
0  0  0
1  0  0
2  0  0
3  0  0
4  0  0
5  0  0
6  0  0
7  0  0
8  0  0
9  0  0

[10 rows x 2 columns]
In [15]:

data.loc[2:2,'a':'b']=5,6
data
Out[15]:
   a  b
0  0  0
1  0  0
2  5  6
3  0  0
4  0  0
5  0  0
6  0  0
7  0  0
8  0  0
9  0  0

[10 rows x 2 columns]

If you are updating the entire row, there is no need to specify columns, `data.loc[2] = 5,6` should be enough. Observe that if you want to update the entirety of the row with the same value, you could type just `data.loc[2] = 3` and if you provide more values than there are columns, you will get an `ValueError` — Engels Leonhardt, Nov 27 '19 at 12:10

score 1 · Answer 2 · answered Nov 26 '19 at 22:53

1

If you are replacing the entire row then you can just use an index and not need row,column slices. ...

data.loc[2]=5,6

answered Nov 26 '19 at 22:53

Greg Kendall

53
3

replace rows in a pandas data frame

2 Answers2

Linked

Related