-2

I have a Pandas dataframe where indexes are numeric subject IDs of respondents, who participated in sociological test.

Basically, the question is two-fold.

a). How can I rename single duplicate index in Pandas DataFrame?

A portion of data looks like this (first column is index):

subject build   gender_response
7   5.0.6.0 Female
5   5.0.6.0 Male
4   5.0.6.0 Male
3   5.0.6.0 Female
3   5.0.6.0 Female
1   5.0.6.0 Male

For example, I just need to reset one of the index ("3") to any other integer.

I have tried the major function from pandas documentation - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.rename.html However, if I set parameter "Inplace" to True, nothing happens:

master.iloc[3].rename(120, inplace=True)

If I create a new variable and use same expression without this parameter, it return a Pandas.Series with new index:( But I need it to be applied to new dataframe.

master2 = master.iloc[3].rename(120)

b). How to make changes conditional to a value in other columns?

  subject   time    Gender  Age
7   12:30:10    Female  23
5   12:23:10    Male    18
4   12:22:17    Male    36
3   12:16:55    Female  45
3   12:16:16    Female  67
1   12:05:22    Male    28

For example, I have column "time" the test have been taken on. I tried to do it via Pandas apply function, something like:

time_point = pd.Timestamp(1/19/2017 12:16:55)
def filter_by_time(x):
  if x[time] == Timestamp:
     x.index.rename(120)

Applied it to the rows of dataframe.

Thoughts?

  • Just noticed mistake in the last block of code. – Yevhen Barshchevskyi Jan 19 '17 at 09:45
  • time_point = pd.Timestamp(1/18/2017 12:16:55) def filter_by_time(x): if x[time] == time_point: x.index.rename(120) – Yevhen Barshchevskyi Jan 19 '17 at 09:46
  • You should edit your question to fix the mistake. – IanS Jan 19 '17 at 09:50
  • Also, these are two different questions. Can you split your question, i.e. post two separate questions? That's how it works on SO. – IanS Jan 19 '17 at 09:51
  • Finally, can you confirm that `subject` is the index of your dataframe? It looks like it's just another column. – IanS Jan 19 '17 at 09:53
  • 2
    `rename` on an index assigns or changes the name attribute, it doesn't change the label, do the existing index labels matter? For instance you could just call `reset_index(drop=True, inplace=True)` if it doesn't – EdChum Jan 19 '17 at 09:56
  • So, it does matter, because I have two datasets where one is demographic file and another is actual scores. Ultimately, I need to merge these two datasets and use subjectid as "common denominator" – Yevhen Barshchevskyi Jan 19 '17 at 10:00
  • So you want to keep this row or drop it? for instance you could call `reset_index()` and then `drop_duplicates(subset='subject')` if you want to remove it – EdChum Jan 19 '17 at 10:03
  • I need it to be consistent with subject duplicates I have in two datasets. – Yevhen Barshchevskyi Jan 19 '17 at 10:08
  • What about the second part of the question? Making renaming conditional on a specific column value? – Yevhen Barshchevskyi Jan 19 '17 at 10:09
  • Have you seen [this](http://stackoverflow.com/questions/19851005/rename-pandas-dataframe-index)? With `df.rename(index={1: 'a'}, inplace=True)` you can rename existing indices, although you won't be able to eliminate duplicates... I think this could help you with your second question? – nostradamus Jan 19 '17 at 10:16
  • Nope, it does not work. It changes two rows with indexes "3". – Yevhen Barshchevskyi Jan 19 '17 at 10:22
  • Sure, that's what it is supposed to do, I guess. If you want to avoid this, I would propose to introduce an additional column containing a "pure" index from 0..x. Then, you can still access the subject number, but you have a unique identifier for each row. The part which I don't get is your statement "I need it to be consistent with subject duplicates I have in two datasets". If you still need to identify, you can not just rename it arbitrarely, right? For me, the only way to keep the connection is to use an additional index-column? – nostradamus Jan 19 '17 at 10:31
  • @YevhenBarshchevskyi you can try this using index values, see answer – Rakesh Kumar Jan 19 '17 at 10:56
  • The answer posted by ErnestScribbler [here](https://stackoverflow.com/questions/40427943/how-do-i-change-a-single-index-value-in-pandas-dataframe/49854311#49854311) may be very useful. – Jellis Jan 25 '21 at 06:46

1 Answers1

1

For Query 1,

you are renaming a index on index location basis.So you can try this master.index.values[3] = 120 instead master.iloc[3].rename(120, inplace=True).

For Query 2, Try This

def filter_by_time(x):
    if x.name == "time":
        for index, value in enumerate(x):
            if value == pd.Timestamp("1/19/2017 12:16:55").strftime("%H:%M:%S"):
                master.index.values[index] = 120 
master.apply(filter_by_time)
Rakesh Kumar
  • 4,319
  • 2
  • 17
  • 30