2

I have trouble applying the following to my series.

 Data['Notes']
 0       2018-06-07 09:38:14Z -- legal -- As per ...
 1       2018-06-05 12:48:26Z -- name -- Holdin...
 2       2018-06-05 17:15:48Z -- filing -- Answe...
 3       2018-06-11 08:34:53Z -- name -- lvm i...
 4       2018-05-11 08:31:26Z -- filed -- summo...
 5       2018-06-01 16:07:11Z -- Name Rogers -- sent ...

import re

keywords = {'file', 'filing', 'legal'}
max_words_after  = 5

key_re = re.compile(fr"""
(?:{'|'.join([w for w in keywords])})   #keyword options group
\s((?:[\s]?[A-Za-z\']+[\s]?)    #capture word. include with line-breaks
{{1,{max_words_after}}})                #1 to max_words_after
""", re.VERBOSE|re.IGNORECASE
)

for f in data['Notes']:
data['Result'] = key_re.findall(f)

In response, all I get is

"ValueError: Length of values does not match the length of index."

Please tell me how I can get a result for every index position and append it to a new series within the data frame.

  • You are overriding data['result'] every loop. Other than that, we need to know what data is to help. See https://stackoverflow.com/questions/42382263/valueerror-length-of-values-does-not-match-length-of-index-pandas-dataframe-u – Zev Jun 13 '18 at 16:59
  • I am sorry for not including the data but it looks something like this: – user9937345 Jun 13 '18 at 17:02
  • 0 2018-06-07 09:38:14Z -- Name -- As per ... 1 2018-06-05 12:48:26Z -- Name -- Holdin... 2 2018-06-05 17:15:48Z -- Name -- Answe... – user9937345 Jun 13 '18 at 17:03
  • WIth 0 1 2 being on a new index – user9937345 Jun 13 '18 at 17:03
  • Please reply by editing your post with the data. It's hard to format it here and it is important to your question. Preferably, include it with your code in a way where your code can be copied and ran. – Zev Jun 13 '18 at 17:06
  • Hi, Zev is this what you meant by adding the data? – user9937345 Jun 13 '18 at 19:13

1 Answers1

0

Understanding your error

key_re.findall(f) returns a list of varying sizes (I think 0 or 1 keyword will be found but depending on your re expressions it could be more).

You are broadcasting this to all the rows is your dataframe which of course doesn't have the same number of items. Hence "Length of values does not match the length of index."

I don't think that's what you want to do anyway. I think you want to create a new column based on another column. See this question for details but here's it applied to your situation.

Fixing your code

import re
import pandas as pd

Here's what I was looking for regarding your Data variable. Something I can copy and paste and run:

Data = pd.DataFrame([["2018-06-07 09:38:14Z -- legal -- As per ..."],["2018-06-05 12:48:26Z -- name -- Holdin..."]], columns=["Notes"])

Create a function that does the transformation that you want.

def find_key_words(row):
    keywords = {'file', 'filing', 'legal'}
    max_words_after  = 5

I'm only including the first line of your re expression because when I tested it, I always got no results when I had your complete expression in there. You can modify this as you need.

    key_re = re.compile(fr"""
        (?:{'|'.join([w for w in keywords])})   #keyword options group
        """, re.VERBOSE|re.IGNORECASE
    )
    return key_re.findall(row['Notes'])

Now apply that function to each row. That way, you'll be broadcasting something that matches the length of what Data['Result'] would expect.

Data['Result'] = Data.apply(lambda row: find_key_words(row),axis=1)
Zev
  • 3,423
  • 1
  • 20
  • 41