Extracting an element from a list of strings in a dataframe (non-destructively)

Question

Someone marked this question as a duplicate of some other post without fully appreciating the problem and there was possibly a detail missing, which might have led to that conclusion on their part, so I am forced to post it again with one minor edit.

I have a dataframe with one column that looks like this:

Row 1: ['EC']
Row 2: ['EP', 'PY']
...
Row 13978: ['EC']

This column was created by using a regex on a column of strings which contained the above two-letter codes for fault conditions in successive runs of a simulation code within parenthesis:

df1['Error Code(s)']=df1['Error Code(s)'].apply(lambda x: re.findall('\((.*?)\)',x))

Now, for a step in the analysis, I need to access the first element in the error code list for each run. I wrote:

df1['Error Code 1']=df1['Error Code(s)'].apply(lambda x: x[0])

The error I get:

IndexError: list index out of range

I have gone to the extent of writing the dataframe to an Excel file, reading the column to make sure that I have a list of strings. When I create a dummy list of strings, and try to reference [0] element in that list (without a dataframe involved), it works fine. But throw in the lambda function on the dataframe column, and it chokes.

I have quickly scanned through the entire column in the Excel file, and none of the lists look odd in any way, so this is not just good code choking on bad data.

I have also tried to use re.search() instead of re.findall(), and then used .group[0] on the lambda function. I get a different error. Either way, I am unable to move ahead. I am a novice with Python, so I am sure I am missing something basic.

Check if it works with this lambda instead: ```lambda x: x[0] if len(x) > 0 else x``` If yes it means that your regex had not found a match for some lines — Sebastian, May 22 '20 at 16:51

Extracting an element from a list of strings in a dataframe (non-destructively)

0 Answers0