I have a dataframe that contains a column with references I'd like to extract to a list.
The references in the rows are mixed with other reference types.
stage, source, references,
open, phone, CAZ101AH.BA6642 2.266.548.55 20-44-255,
open, email, L.22.365.211.3 CAZ125AH.BA0842,
open, phone, 01-02-225 CAZ141AH.BA1342,
open, email, CAP101AH.BA2142,
open, email, 01-56-655,
open, phone, 03-07-555 2.655.585.55
open, email, L.22.365.566.5 CAZ101AH.BA6642,
I'd like to extract a certain type of reference (specifically the ones that start with "CA...")
The regex match for the reference type is '\w+[.]\w{6}'
The closest I've got so far is:
df['references'].str.findall(r'\w+[.]\w{6}').tolist()
which outputs
['CAZ101AH.BA6642'],
['CAZ125AH.BA0842'],
['CAZ141AH.BA1342'],
['CAP101AH.BA2142'],
[],
[],
['CAZ101AH.BA6642'],
How would i extract a list without the brackets, and empty colums for non matched rows etc?
ideally I'd like to generate a list with similar output to .value_counts() with
CAZ101AH.BA6642 2
CAZ125AH.BA0842 1
CAZ141AH.BA1342 1
CAP101AH.BA2142 1