I have a df like this:
text labels
0 2083 [CARDINAL (0.8677)]
1 2085 [CARDINAL (0.5846)]
2 1822 [DATE (0.9581)]
3 DHAKA. [GPE (0.6306)]
4 BANGLADESH [GPE (0.6535)]
5 2085 [CARDINAL (0.7502)]
6 Manlkganj [GPE (0.8888)]
7 Bangladesh [GPE (0.9916)]
What I want is:
text labels
0 2083, 2085 CARDINAL
1 1822 DATE
2 DHAKA. BANGLADESH GPE
3 2085 CARDINAL
4 Manlkganj Bangladesh GPE
Club the contious values of same labels and merge them and make it one row and drop every other row which is not in ls=['GPE', 'ORG', 'CARDINAL']
.
I have done it in a more non-pythonic way, looping over the df with df.iterrows() and then df['labels'].str.split('('][0] in ls, takes a lot of time and not getting the proper desired results as well. I was wondering if there's a way to do it more efficiently, a string operation and manipulation of rows.
df to dict 'dict' format to recreate:
{'text': {0: '2083',
1: '2085',
2: '1822',
3: 'DHAKA.',
4: 'BANGLADESH',
5: '2085',
6: 'Manlkganj',
7: 'Bangladesh',
8: 'DHAKA',
9: 'BANGLADESH'},
'start_pos': {0: 49,
1: 54,
2: 107,
3: 236,
4: 243,
5: 355,
6: 396,
7: 414,
8: 540,
9: 547},
'end_pos': {0: 53,
1: 58,
2: 111,
3: 242,
4: 253,
5: 359,
6: 405,
7: 424,
8: 545,
9: 557},
'labels': {0: [CARDINAL (0.8677)],
1: [CARDINAL (0.5846)],
2: [DATE (0.9581)],
3: [GPE (0.6306)],
4: [GPE (0.6535)],
5: [CARDINAL (0.7502)],
6: [GPE (0.8888)],
7: [GPE (0.9916)],
8: [GPE (0.5669)],
9: [GPE (0.878)]}}
Thanks in advance.