-1

I have a data frame df1 that looks like -

user     data                               dep                    
1        ['dep_78','fg7uy8']                78
2        ['the_dep_45','34_dep','re23u']    45
3        ['fhj56','dep_89','hgjl09']        91

I want to focus on the column "data" with values containing the string "dep" and see if the number attached to that string matches with the number in the "dep" column. For example, dep_78 in data colum for user 1 matches with dep 78 in dep column. I want to output the rows with a mismatch. So the result should give me -

user     data                      dep
2        ['the_dep_45','34_dep']   45
3        ['dep_89']                91

The problem is to take only specific values in data column with string "dep" and then compare the numbers attached with those strings with the "dep" column.

ComplexData
  • 1,091
  • 4
  • 19
  • 36
  • The numbers attached with all the strings containing "dep" in the column "data", should match with the numbers in the "dep" column. dep_89 in data is a mismatch to 91 in dep column. – ComplexData Aug 07 '17 at 21:22
  • 1
    It's my fault for looking on a phone, I missed `dep` in the first block. Still, I think your first step is splitting the strings in `data`? Why do you have a dataframe in this format in the first place? – roganjosh Aug 07 '17 at 21:24
  • 1
    Can you provide some context for your question? What have you tried so far? Why not refactor your dataframe as suggested to you [here](https://stackoverflow.com/questions/45552952/extracting-specific-rows-from-a-data-frame/45553169#45553169)? – RagingRoosevelt Aug 07 '17 at 21:24
  • Possible duplicate of [Extracting specific rows from a data frame](https://stackoverflow.com/questions/45552952/extracting-specific-rows-from-a-data-frame) – RagingRoosevelt Aug 07 '17 at 21:26

2 Answers2

0

How about this?

import re

r = re.compile('\d+')

idx = df.apply(lambda x: str(x['dep']) in r.search(x['data']).group(0), axis=1)

0     True
1     True
2    False
dtype: bool


df[idx]

   user                             data  dep
0     1              ['dep_78','fg7uy8']   78
1     2  ['the_dep_45','34_dep','re23u']   45
gold_cy
  • 13,648
  • 3
  • 23
  • 45
-1

You can do that

def select(row):
    keystring = 'dep_'+str(row['dep'])
    result = []
    for one in row['data']:
        if (one!=keystring)&('dep' in one):
            result.append(one)
    return result

df['data'] =df.apply(lambda x:select(x),axis=1)
df['datalength'] = df['data'].map(lambda x:len(x))
result = df[df['datalength']>0][df.columns[:3]]
print(result)
   user                  data  dep
1     2  [the_dep_45, 34_dep]   45
2     3              [dep_89]   91
giser_yugang
  • 6,058
  • 4
  • 21
  • 44