0

I am trying to filter out values from a pandas data-frame and then generate a column with those values. To further clarify myself here is an example

print (temp.head())

Index  Work-Assigned  Location
  A         R            NL
  B         df           MB
  A                      NL
  C         SL           NL
  D         RC           MB
  A         RC           AB

Now what I want to do is to filter out all the R and SL values from this data-frame and create another data-frame with just those values and the index. Something like this:

print (result.head())

Index    R/SL
  A       R
  B      
  C       SL
  D

I tried pivoting the data with Work-Assigned as the value, as you see certain value in the index column is repeated, but that didn't work.

Drise
  • 4,310
  • 5
  • 41
  • 66
  • so something like `df[df['Work-Assigned'].isin(['R', 'SL'])][['Index', 'Work-Assigned']]`? Though it's not clear to me how the rows 'B' and 'D' ended up in your output... – pault Mar 15 '18 at 18:40
  • For an index that does not has either 'R' or 'SL' as values, I want them to have empty spaces or NAN value. To further clarify for an Index that has multiple work values I only want the 'R" and 'SL' values, removing all other values and for an Index that only has a NAN value for work, I want to keep that as it is. –  Mar 16 '18 at 16:32
  • Can an Index have both `'R'` and `'SL'`? What happens then? – pault Mar 16 '18 at 17:41

2 Answers2

0

I believe the following gives the requested output:

# your original dataframe
df = pd.DataFrame({'Index': {0: 'A', 1: 'B', 2: 'A', 3: 'C', 4: 'D', 5: 'A'}, 'Location': {0: 'NL', 1: 'MB', 2: 'NL', 3: 'NL', 4: 'MB', 5: 'AB'}, 'Work-Assigned': {0: 'R', 1: 'df', 2: '', 3: 'SL', 4: 'RC', 5: 'RC'}}).set_index('Index').reindex(['Work-Assigned', 'Location'], axis=1)


df
Out[5]: 
      Work-Assigned Location
Index                       
A                 R       NL
B                df       MB
A                         NL
C                SL       NL
D                RC       MB
A                RC       AB

def some_filtering(df_, filter_values=['R', 'SL']):
    # use regex to create a Series which contains bool of whether any `filter_values` are found
    s_filter = df_['Work-Assigned'].str.extract('^({})$'.format('|'.join(filter_values)), expand=False)

    # if nothing was found then return a blank string; otherwise return the unique value found
    if s_filter.dropna().empty:
        val = ['']
    else:
        val = pd.unique(s_filter.dropna())

    # return a DataFrame containing the unique value found (could be blank) at the present index value passed to .groupby
    return pd.DataFrame(data=val, index=pd.unique(df_.index), columns=['/'.join(filter_values)])


df.groupby(level='Index', group_keys=False).apply(some_filtering)
Out[7]: 
  R/SL
A    R
B     
C   SL
D     
jeschwar
  • 1,286
  • 7
  • 10
0

IIUC, you want to group by Index and collect the values into a set. Then check the set for the values 'R' or 'SL'.

Assuming your DataFrame is named df, you could do the following:

Group by 'Index' and apply the set constructor to the 'Work-Assigned' column. This will condense all distinct values for each Index into one row.

df2 = pd.DataFrame(df.groupby('Index')['Work-Assigned'].apply(set)).reset_index()
print(df2)
#  Index Work-Assigned
#0     A  {nan, R, RC}
#1     B          {df}
#2     C          {SL}
#3     D          {RC}

Next check for the intersection of each row's set with the values you want to search for. If the intersection is null, return an empty string (or np.nan if you prefer). Otherwise, pick the first value.1

my_values = {'R', 'SL'}
df2['Work-Assigned'] = df2['Work-Assigned'].apply(
    lambda x: '' if not my_values.intersection(x) else list(my_values.intersection(x))[0]
)
print(df2)
#  Index Work-Assigned
#0     A             R
#1     B              
#2     C            SL
#3     D              

References

Notes

1 In the case where multiple (in your case both) values exist, you will get one arbitrarily. If that is a problem, please update your problem statement on how you would like to handle that case.

pault
  • 41,343
  • 15
  • 107
  • 149
  • Did this answer your question? If so, please consider accepting it by clicking the check mark. You are under no obligation to do so, but it can be helpful for readers in the future who may come across a similar issue. Read more on [what to do when someone answers my question](https://stackoverflow.com/help/someone-answers). – pault Mar 20 '18 at 16:32