1

I have a dataframe like this:

df = pd.DataFrame(index=['pre1_xyz', 'pre1_foo', 'pre3_bar', 'pre3_foo', 'pre10_foo', 'pre10_bar', 'pre10_xyz'])

to which I want to add a column values whereby the value is determined based on the prefix of the index of the respective row using a function return_something(pref). Right now I implement that as follows:

import pandas as pd
import numpy as np

# this just returns a random value for the sake of simplicity
def return_something(pref):

    return np.random.choice(len(pref)+10)


df = pd.DataFrame(index=['pre1_xyz', 'pre1_foo', 'pre3_bar', 'pre3_foo', 'pre10_foo', 'pre10_bar', 'pre10_xyz'])

# get all the unique prefixes
unique_pref = set([pi.partition('_')[0] for pi in df.index])

# determine the value for each prefix
val_pref = {pref: return_something(pref) for pref in unique_pref}

# add the values to the dataframe
for prefi, vali in val_pref.items():

    # determine all rows with the same prefix
    rows = [rowi for rowi in df.index if rowi.startswith(prefi+'_')]

    df.loc[rows, 'values'] = vali

That then gives me the desired output:

           values
pre1_xyz        0
pre1_foo        0
pre3_bar        7
pre3_foo        7
pre10_foo      13
pre10_bar      13
pre10_xyz      13

Question is whether there is anything smarter than this e.g. a solution which avoids creating unique_pref and/or val_pref and/or makes use of set_value which seems to be the fastest solution to add values to a dataframe as discussed in this question.

Community
  • 1
  • 1
Cleb
  • 25,102
  • 20
  • 116
  • 151

1 Answers1

3

Because you have repeats of the prefix, you want to first separate out the prefix to make sure you don't generate a new random number for the same prefix. Therefore the removal of duplicates is necessary from your prefix list. I did this in a more condensed way by making a new column for the prefix and then using df.prefix.unique().

df['prefix'] = [i.split('_')[0] for i in df.index]
df['values'] = df.prefix.map(dict(zip(df.prefix.unique(),[return_something(i) for i in df.prefix.unique()])))
A.Kot
  • 7,615
  • 2
  • 22
  • 24
  • @Cleb consider marking it as the answer if it solved your problem. – piRSquared Oct 19 '16 at 15:41
  • @piRSquared: I will, if nothing better shows up. I usually wait for a while until I accept since sometimes an even better solution comes up. If you look at my profile, you will see that I always accept an answer if there is (a good) one. – Cleb Oct 19 '16 at 15:46
  • 1
    @Cleb that is totally fair. I appreciate you saying so. I tend to give reminders to folks because I see it too often. – piRSquared Oct 19 '16 at 15:50