1

Here is sample jason data.

id  opened_date      title       exposure state

1  06/11/2014 9:28 AM Device rebooted and crashed with error 0x024 critical open

2  06/11/2014 7:12 AM Not able to connect to WiFi                  High     open

3  07/23/2014 2:11 PM Sensor failed to recognize movement          Low open

4  07/07/2014 5:20 PM When sensor activated, device rebooted with error 0x024 critical closed

I want to write a code with input as a string and output shall point to ID.

For example:

Input String = Sensor : Output = ID 3 and 4 has 'Sensor' word in it
Input String = 0x024  : Output = ID 1 and 4 has '0x024' in it.

I'm guessing this will require some kind of groupby but it works on complete data set rather than a string. Is this possible with pandas or is there any other better solution to analyze this?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
yogi_peace
  • 111
  • 2
  • 5
  • you can do `df.loc[df['title'].str.contains('Sensor'),'id]` so pass your input string as the arg for `str.contains` – EdChum May 23 '16 at 10:33

1 Answers1

3

You can use loc for selecting by condition created str.contains with parameter case=False. Last if you need list use tolist:

li = ['Sensor','0x024']

for i in li:
    print (df.loc[df['title'].str.contains(i, case=False),'id'].tolist())
    [3, 4]
    [1, 4]

For storing you can use dict comprehension:

dfs = { i: df.loc[df['title'].str.contains(i, case=False),'id'].tolist() for i in li }

print (dfs['Sensor'])
[3, 4]
print (dfs['0x024'])
[1, 4]

If you need function, try get_id:

def get_id(id):
    ids = df.loc[df['title'].str.contains(id, case=False),'id'].tolist()
    return "Input String = %s : Output = ID " % id + 
            " and ".join(str(x) for x in ids) + 
            " has '%s' in it." % id

print (get_id('Sensor'))
Input String = Sensor : Output = ID 3 and 4 has 'Sensor' in it.

print (get_id('0x024'))
Input String = 0x024 : Output = ID 1 and 4 has '0x024' in it.

EDIT by comment:

Now it is more complicated, because use logical and:

def get_multiple_id(ids):
    #split ids and crete list of boolean series containing each id
    ids1 = [df['title'].str.contains(x, case=False) for x in ids.split()]
    #http://stackoverflow.com/a/20528566/2901002
    cond = np.logical_and.reduce(ids1)

    ids = df.loc[cond,'id'].tolist()
    return "Input String = '%s' : Output = ID " % id +
           ' and '.join(str(x) for x in ids) +
           " has '%s' in it." % id

print (get_multiple_id('0x024 Sensor'))
Input String = '0x024 Sensor' : Output = ID 4 has '0x024 Sensor' in it.

If use logical or, it is more easier, because or in re is |, so you can use 0x024|Sensor:

def get_multiple_id(id):
    ids = df.loc[df['title'].str.contains(id.replace(' ','|'), case=False),'id'].tolist()
    return "Input String = '%s' : Output = ID " % id +
            ' and '.join(str(x) for x in ids) +
            " has '%s' in it." % id

print (get_multiple_id('0x024 Sensor'))
Input String = '0x024 Sensor' : Output = ID 1 and 3 and 4 has '0x024 Sensor' in it.
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I should have re-framed my question to add one more case. Apologies for asking question in a comment. How to integrate this use case: Input String = '0x024, sensor' : Output = ID 4 has '0x024' and 'sensor' in it. ? – yogi_peace May 24 '16 at 03:55
  • Thanks @jezrael . This helped me alot. This is my day2 in Pandas and I am already loving it. Here is next thread I posted: [link](http://stackoverflow.com/questions/37411300/pandas-split-text-in-column-and-search-in-rows) – yogi_peace May 24 '16 at 10:45
  • Thank you. But your question is unclear for me. How you can split text? by ` ` or by [ ? – jezrael May 24 '16 at 10:53
  • Or you need common words in ech line? – jezrael May 24 '16 at 10:57
  • Hi jezrael, for selected title, i want to see if any other title in same table has some common words into it. for example, when we submit a new question on stackoverflow, depending on title, we get suggestions of matching issues . – yogi_peace May 24 '16 at 12:00
  • Ok, so you need find some of strings `[VERY URGENT] Pay your rent` maybe without `[]` in all rows of column `title`? You need find `very` or `urgent` or `pay` or `your` or `rent`? – jezrael May 24 '16 at 12:02
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/112784/discussion-between-yogi-peace-and-jezrael). – yogi_peace May 24 '16 at 12:04