0

I have a huge csv file that has a column with descriptions of user problems. Something like 1. "Please reset my password - User name is xxxx" 2. "My phone voicemail is not working" 3. "I have a broken desk"

I am trying to create a generator in python that reads this column and creates a generator with two words. So, in the above example, it should create a generator like this: ('Please reset', 'reset my', 'my password', 'password -',.... 'My phone', 'phone voicemail',... 'I have', 'have a'....)

Note that I am looking to create only generators, not lists, because the file is huge. I can create a generator with the words ('Please', 'reset', 'my', 'password'...), but I am not able to concatenate words.

I am using: word = (word for row in csv.reader(f) for word in row[3].lower().split()) to create the generator with words.

Bhanu
  • 1

2 Answers2

0
listofwords = [words[i]+" "+words[i+1] for i in range(len(words)-1)]
timgeb
  • 76,762
  • 20
  • 123
  • 145
sam
  • 1
  • 2
0

You're looking for a Rolling or sliding window iterator. The accepted answer to that question is the one below, though I suggest reading through the answers there:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

So for every line, we can get the window iterator over that line, then use chain to flatten them into a single iterator.

import csv
from itertools import chain

with open('file.txt') as f:
    r = csv.reader(f)
    descriptions = (line[3].lower().split() for line in r)
    iterators = map(window, descriptions)
    final = chain.from_iterable(iterators)
    for item in final:
        print(item)

For the file

,,,a b c
,,,d e f

this would print

('a', 'b')
('b', 'c')
('d', 'e')
('e', 'f')
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96