2

I have a dataframe with 15 separate ICD columns(ICD1 to ICD15) and want to create a variable "Encep" (0/1), when the digits "323" appear in any of the 15 ICD columns.

The dataframe itself contains over 30 variables and looks like this

PT_FIN    DATE     Address...     ICD1    ICD2...      ICD15
1         July      123 lane        523    432         .
2         August    ABC road        523    43.6       12.8

Not entirely sure if I'm on the right track but I wrote the following code in an attempt to accomplish my task but am getting an error:

CODE

ICDA = ["ICD1","ICD2","ICD3","ICD4","ICD5","ICD6","ICD7","ICD8","ICD9","ICD10","ICD11","ICD12","ICD13","ICD14","ICD15"]

ICD1.loc[:,"Encep"]=np.where(ICD1["ICDA"].str.contains("323", case=False),1,0)

ERROR

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2889             try:
-> 2890                 return self._engine.get_loc(key)
   2891             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'ICDA'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-34-564afcae6cd2> in <module>
      1 ICDA= ["ICD1","ICD2","ICD3","ICD4","ICD5","ICD6","ICD7","ICD8","ICD9","ICD10","ICD11","ICD12","ICD13","ICD14","ICD15"]
----> 2 ICD1.loc[:,"LumbPCode"]=np.where(ICD1["ICDA"].str.contains("323", case=False),1,0)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2973             if self.columns.nlevels > 1:
   2974                 return self._getitem_multilevel(key)
-> 2975             indexer = self.columns.get_loc(key)
   2976             if is_integer(indexer):
   2977                 indexer = [indexer]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2890                 return self._engine.get_loc(key)
   2891             except KeyError:
-> 2892                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2893         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2894         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'ICDA'

EDIT

I found a similar question and answer but need to know how to apply this select columns - not the whole dataframe

Finding string over multiple columns in Pandas

Raven
  • 849
  • 6
  • 17

2 Answers2

1

Keyerror comes from the fact that there is no column (i.e. no 'key') in your dataframe called ICDA.

Calling .str.contains on that column, even if it existed, would make no sense either, as it appears to be a column of column names.

Possible Solution

Did you try calling it without the quoted "ICDA"?

np.where(ICD1[ICDA].str.contains("323", case=False),1,0)

New Solution

The following should work.

ICDA = ["ICD1","ICD2","ICD3","ICD4","ICD5","ICD6","ICD7","ICD8","ICD9","ICD10","ICD11","ICD12","ICD13","ICD14","ICD15"]

# if those cols aren't strings, make them (probably best to leave as float and compare, tho)
for col in ICDA:
    ICD1[col] = str(ICD1[col])

ICD1['Encep'] = (ICD1[ICDA].values == '323').any(1).astype(int)

For all future questions, make sure to create a minimal reproducible example :)

Jared Wilber
  • 6,038
  • 1
  • 32
  • 35
  • Good thought but doing this produces an ERROR AttributeError: 'DataFrame' object has no attribute 'str' – Raven Jun 19 '20 at 19:38
  • I've updated the answer – Jared Wilber Jun 19 '20 at 23:06
  • Thank you! This code works great. I'm a Python newb so I was curious if you could please explain/annotate how this code works - or provide useful material that I can reference. Not sure why @prune marked this a duplicate as my question is not related to the duplicate he linked (oh well)! – Raven Jun 20 '20 at 19:56
  • @Jared_Wilber, on second thought, when I run this code, its not catching any of the records that contain "323". – Raven Jun 23 '20 at 15:45
0

You have confused a literal string with a variable:

np.where(ICD1["ICDA"].str

There is no column "ICDA" in your table. Column names are the keys of a table; hence the error.

Hint: you might want to use the any function to check whether at least one column has the desired property. You might find it easier or faster to concatenate the entire row, and check to see whether "323" appears in that one string.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • I was thinking that I could concatenate the columns an that is true, but I'm interested in learning the method which will answer my question in the event I run into a similar problem going forward and concatenation is not an option. – Raven Jun 19 '20 at 19:57