1

I am filtering some data from the csv file which works fine, but while matching a list items via str.conatin regex in pandas it prints the result for the items which is finds but i want to tag the items which does not match like "kpc8472", "kpc1165" these are not present in the CSV file thus not returning any results but i need to know about those missing items to be tagged as well.

import pandas as pd
# server names to be searched on the file in list format    
search_list =  ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]

# sorted column list
cols = [ 'Server', 'Server Name', 'iLO FW', 'Firmware', 'Appliance Name']

# Reading CSV with filtered columns
df = pd.read_csv("Server-Inventory.csv", usecols=cols)

# match the search_list items from the column "Server Name"
df = df[df['Server Name'].astype(str).str.contains('|'.join(search_list))]
print(df)

DataFrame:

           Server                    Server Name            iLO FW                Firmware         Appliance Name
0  ENC2002, bay 10                      kpc2021   2.50 Sep 23 2016  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
1  ENC8023, bay 7                kpc8291.db.com   2.40 Dec 02 2015  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
2  enc8009, bay 12                kpc8471.db.com  2.61 Jul 27 2018  I42 v1.42 (06/20/2020)  OV C7000 enclosures 1
3  enc1011, bay 1                        kpc8479  2.55 Aug 16 2017  I36 v2.74 (10/21/2019)  OV C7000 enclosures 1
4  enc1014, bay 1                        kpc1168  2.70 May 07 2019  I36 v2.74 (11/13/2019)  OV C7000 enclosures 1

Result:

               Server Server Name            iLO FW                Firmware         Appliance Name
440   ENC2002, bay 10     kpc2021  2.55 Aug 16 2017  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
981    enc8023, bay 7     kpc8291  2.55 Aug 16 2017  I36 v2.52 (10/25/2020)  OV C7000 enclosures 2
2642  enc8009, bay 12     kpc8471  1.30 May 31 2018  I42 v1.42 (06/20/2020)                 ov7003

Thanks for the help and ideas.

Note: I need to tag items of the list search_list which is not matched!

user2023
  • 452
  • 5
  • 22

4 Answers4

2

I think you can try by comparing two list:

serverName_list=df['Server Name'].unique().tolist()

If all élément of Server Name column has thé same format, you should clean data with for example:

serverName_clean_list=[] 
for element in serverName_list:
    serverName_clean_list.append(element.split(".")[0])

And according to Python find elements in one list that are not in the other

import numpy as np
main_list = np.setdiff1d(serverName_clean_list, search_list).tolist()
# yields the elements in `list_2` that are NOT in `list_1`
Renaud
  • 2,709
  • 2
  • 9
  • 24
  • Thank you @Renaud, let me try. – user2023 May 12 '20 at 17:30
  • Its throwing an error `AttributeError: 'int' object has no attribute 'tolist'` for `serverName_list=df['Server Name'].nunique().tolist()` – user2023 May 12 '20 at 17:39
  • Sorry if forgot thé argument axis=1 in serverName_list=df['Server Name'].nunique(axis=1).tolist() – Renaud May 12 '20 at 17:47
  • NP, but its not taking that `TypeError: nunique() got an unexpected keyword argument 'axis'` i beleive it work as `df.nunique(axis=1)` however +1 for the best tried. – user2023 May 12 '20 at 17:56
2

Instead of using .str.contains, use .str.extractall to get exactly the substrings that match the items in your list. Then check which elements in the list matched to at least one thing either using .isin (or set logic).

pat = '(' + '|'.join(search_list) + ')'
#'(kpc2021|kpc8291|kpc8471|kpc8472|kpc1165)'

result = pd.DataFrame({'item': search_list})
result['in_df'] = result['item'].isin(df['Server Name'].str.extractall(pat)[0])

print(result)

      item  in_df
0  kpc2021   True
1  kpc8291   True
2  kpc8471   True
3  kpc8472  False
4  kpc1165  False

Using .str.extractall we get a Series of the substrings that we matched. There's a MultiIndex, the outer level is the original DataFrame index, the inner level is a counter for the number of items it matched on that line (.extractall can have multiple matches).

df['Server Name'].str.extractall(pat)[0]
#   match
#0  0        kpc2021
#1  0        kpc8291
#2  0        kpc8471
#Name: 0, dtype: object
ALollz
  • 57,915
  • 7
  • 66
  • 89
0

To return non-matches add a ~:

df = df[~df['Server Name'].astype(str).str.contains('|'.join(search_list))]
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • NYC Coder, thanks for the answer, but i don't need it because i need to tag the Servers names which are not in CSV file from the List `search_list` apert from which are matched otherwise this will give you all non-matched Server from CSV file. – user2023 May 12 '20 at 17:10
0

Try with the ~ symbol:

df = df[~df['Server Name'].astype(str).str.contains('|'.join(search_list))]
print(df)
Renaud
  • 2,709
  • 2
  • 9
  • 24
  • Renaud , thanks for the answer, but i don't need it because i need to tag the Servers names which are not in CSV file from the List `search_list` apert from which are matched otherwise this will give you all non-matched Server from CSV file. – user2023 May 12 '20 at 17:10