split text into individual row python

Question

I'm struggling to convert an array into individual tokens. Currently I used the following code, but am not getting to the exact output that I want. As I would like the numbers to be part of it too.

text = df.head(3)[['processed_arti', 'cluster']].values    // where df is a pandas dataframe

terms = [b for l in text for b in zip (l[0].split(" "))]

I've added another picture below showing a bit more detail of how the data looks. Read in a pandas dataframe.

I'd really appreciate any help on this. Thanks in advance.

Could you please provide a MRE? stackoverflow.com/help/minimal-reproducible-example — Rafael Valero, Feb 09 '21 at 18:18
`terms = [b for l in text for b in itertools.product(l[0].split(" "), l[1])]` ? `import itertools` — Epsi95, Feb 09 '21 at 18:20
Thank you @RafaelValero your responses. I've added a few more details in the above question. Thank you. — ALK, Feb 09 '21 at 18:33
Thank you @Epsi95 for your response. I get the following error when I try itertools - "TypeError: 'int' object is not iterable" — ALK, Feb 09 '21 at 18:34
@ALK, if you could please just copy and paste the code instead o pics that would be great. If you place pics them people have to write down themself the code you actually already have. — Rafael Valero, Feb 09 '21 at 18:36
Thank you @RafaelValero for the recommendation and the help. I am sorted now. — ALK, Feb 09 '21 at 18:44

Yevhen Kuzmovych · Answer 1 · 2021-02-09T18:31:58.180

2

Isn't this what you need? You just need to add the number alongside your words:

terms = [(b, n) for l, n in text for b in l.split(" ")]

edited Feb 09 '21 at 18:31

answered Feb 09 '21 at 18:24

Yevhen Kuzmovych

10,940
7
28
48

Tarik · Accepted Answer · 2021-02-09T18:34:28.167

1

First you get a list of lists contains your tuples:

[[(word, l[1]) for word in l[0].split('0')] for l in a] # a being your array.

Then you flatten the list of lists: see How to make a flat list out of list of lists?

Or better, as Yevhen Kuzmovych suggested:

[(word, l[1]) for l in a for word in l[0].split('0')]

Note: Not verified. Typed on my mobile.

edited Feb 09 '21 at 18:34

answered Feb 09 '21 at 18:25

Tarik

10,810
2
26
40

You can avoid flattening by removing the inside brackets and swapping the order of the `for` loops (like OP originally does and like I did in my answer) – Yevhen Kuzmovych Feb 09 '21 at 18:29
Thanks @YevhenKuzmovych, I should have thought about it. – Tarik Feb 09 '21 at 18:32
Thank you @YevhenKuzmovych for your help – ALK Feb 09 '21 at 18:38
Thank you @Tarik for your help. – ALK Feb 09 '21 at 18:38

split text into individual row python

2 Answers2