0

But now I wish to convert it to the following tuple format:

((1231, 123), (2341, 1210), (342,12), (5462, 565))

I really need to find a way to convert this data to the format directly above. I would greatly appreciate any help!

How to covert a string into pairs of tuple? I have already tried this

with open("data.txt") as f:
    list = [line.rstrip('\n') for line in f] 
    mylist = [mylist[x:x+1] for x in range(0, len(mylist), 3)]
    print(mylist)


data = ['I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money']

I want my output to be in this format

[('i', 'went'),('to', 'work'),('but', 'got').........]

I have tried this but not working


import itertools
import nltk
import collections
f=open('readme.txt','r')
data=f.read()
print(data)
d1 = data[0].split() 
output = list(itertools.zip_longest(d1[::2],d1[1::2],fillvalue = None)) 
print(output)

Edited from comment - File content:

['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP '] 
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • What if there is an odd number of words in the string? And why is `data` a list? – DeepSpace May 21 '18 at 10:08
  • 1
    dont use `list` (or any other builtin name) as variable name, it hides it and you get problems later. Format your code: Copy&Paste it into your question, mark it , hit the **{}** to format it as code. – Patrick Artner May 21 '18 at 10:10

2 Answers2

2

You can use itertools.zip_longest wich also works for zipping unevenly lengthy list by supplying a default value (of None if not otherwise specified) to the shorter lists:

You split data at spaces and feed a sublists to zip : once starting at 0 and once starting at 1, both using every other (2nd) element only:

data = ['I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money']

import itertools
d1 = data[0].split() 

# use 2 partial lists, using every 2nd word, once staring at 0, once at 1
# you can change   fillvalue=None   to some other value or remove it - None is the default.
output = list(itertools.zip_longest(d1[::2],d1[1::2], fillvalue = None)) 

print(output)

Output:

[('I', 'went'), ('to', 'work'), ('but', 'got'), ('delayed', 'at'), ('other', 'work'), 
 ('and', 'got'), ('stuck', 'in'), ('a', 'traffic'), ('and', 'I'), ('went', 'to'), 
 ('drink', 'some'), ('coffee', 'but'), ('got', 'no'), ('money', 'and'), 
 ('asked', 'for'), ('money', None)]

The sublists fed to zip_longest look like:

print(d1[::2])

['I', 'to', 'but', 'delayed', 'other', 'and', 'stuck', 'a', 'and', 'went', 'drink', 
 'coffee', 'got', 'money', 'asked', 'money']

and

print(d1[1::2])

['went', 'work', 'got', 'at', 'work', 'got', 'in', 'traffic', 'I', 'to', 'some', 
 'but', 'no', 'and', 'for']

The following part is adapted from Convert string representation of list to list

# -*- coding: utf-8 -*-

import ast

# create your file as utf8
with open("myfile.txt","w", encoding="utf8") as f:
    f.write("['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP ']")

# load your file, using utf8
with open("myfile.txt","r",encoding="utf8") as f:
    data = f.read()
# convert the loaded string literal into a python list    
dataAsList = ast.literal_eval(data)

print(dataAsList)
print(type(dataAsList))

import itertools
d1 = dataAsList[0].split() 

# use 2 partial lists, using every 2nd word, once staring at 0, once at 1
# you can change   fillvalue=None   to some other value or remove it - None is the default.
output = list(itertools.zip_longest(d1[::2],d1[1::2], fillvalue = None)) 

print(p)

Output:

['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP ']

<class 'list'>

[('भिन्केन', 'NNP'), ('डच', 'NNP'), ('प्रकाशन', 'NN'), ('समूह', 'NN'), 
 ('एल्सेभियर', 'NNP'), ('एन.भी.', 'FB'), ('को', 'PKO'), ('अध्यक्ष', 'NN'), 
 ('हुनुहुन्छ', 'VBF'), ('।', 'YF'), ('कन्सोलिडेटिड', 'NNP'), ('गोल्ड', 'NN'), 
 ('फिल्ड्स', 'NN'), ('पीएलसी', 'NNP'), ('का', 'PKO'), ('पूर्व', 'JJ'), 
 ('सभापति', 'NN'), ('५५', 'CD'), ('वर्षीय', 'JJ'), ('रूडोल्फ', 'NNP'), 
 ('अग्न्यु', 'NNP'), ('लाई', 'PLAI'), ('यस', 'DUM'), ('ब्रिटिस', 'NNP'), 
 ('औद्योगिक', 'JJ'), ('समूह', 'NN'), ('को', 'PKO'), ('सल्लाहकार', 'NN'), 
 ('को', 'PKO'), ('रूप', 'NN'), ('मा', 'POP'), ('मनोनयन', 'NN'), 
 ('गरिएको', 'VBKO'), ('थियो', 'VBX'), ('।', 'YF'), ('एकताका', 'RBO'), 
 ('केन्ट', 'NNP'), ('चुरोट', 'NN'), ('को', 'PKO'), ('फिल्टर', 'NN'), 
 ('बनाउन', 'VBI'), ('प्रयोग', 'NN'), ('भएको', 'VBKO'), ('एक', 'CD'), 
 ('प्रकार', 'NN'), ('को', 'PKO'), ('अस्बेस्टोस', 'NNP')] 
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • thank you so much but then i need to import data and obtain this output. [('I', 'went'), ('to', 'work'), ('but', 'got'), ('delayed', 'at'), ('other', 'work'), ('and', 'got'), ('stuck', 'in'), ('a', 'traffic'), ('and', 'I'), ('went', 'to'), ('drink', 'some'), ('coffee', 'but'), ('got', 'no'), ('money', 'and'), ('asked', 'for'), ('money', None)] the above program is working for manually typing the data but not working for reading file can you suggest me? – Asiis Pradhan May 21 '18 at 11:04
  • @AsiisPradhan - Edit your question and paste the "file" content, indent each line of the file by 4 to make it look like the original file -< the 'code' environment respects linebreaks. – Patrick Artner May 21 '18 at 11:34
  • I have edited my question can you suggest what went wrong? – Asiis Pradhan May 21 '18 at 12:09
  • @AsiisPradhan Open your _FILE_ in an editor. Copy& paste all the text of the file into your question. I need to see the files layout to help. You can use `varName = """"""` - it is a multiline string – Patrick Artner May 21 '18 at 12:10
  • -Still not giving the desire output. I want my output in the following form **[('भिन्केन', 'NNP'),(' डच',' NNP'),(' प्रकाशन',' NN')] – Asiis Pradhan May 21 '18 at 12:39
  • -ok i will ask a new question – Asiis Pradhan May 21 '18 at 12:45
  • @AsiisPradhan edited again - see last part - for me it works. If you create a new question, put in a link to this one so others can crossreference – Patrick Artner May 21 '18 at 13:06
  • I cant thank you enough..thank you so much, i really appreciate it. – Asiis Pradhan May 22 '18 at 04:08
  • --its only working for this inputs--f.write("['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP ']") Its not working for the data in the file. How can i over come that? – Asiis Pradhan May 22 '18 at 04:49
  • @AsiisPradhan [How to debug small programs (#1)](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) - if you cant fix it yourself, create a new questions, link to this one, and add demo-data to your new question so SO has data to work on regarding your parsing problem. – Patrick Artner May 22 '18 at 04:59
  • I am trying but not getting the desire result. and i am not allowed to ask another question for another day or two. – Asiis Pradhan May 22 '18 at 05:24
0
splittedData = data[0].split(' ')
counter = len(splittedData)
if counter%2 == 0:
  pass
else:
  counter += 1
output_list= []
for x in range(counter/2):
  output_list.append((splittedData[x], splittedData[x+1]))
Surya Tej
  • 1,342
  • 2
  • 15
  • 25