I think that you don't need to use fileinput:
num_lines = sum(1 for line in open('2grams.txt')) ## in order not to print junk
count = 0
intersect = open('2grams.txt', 'r')
out_file = open("output.txt", 'w')
scores = open("2gram_glues.txt", 'r')
with open('2mwus.txt', 'r') as base:
for line in base:
line = line.rstrip()
number = line[-2:]
number = int(number.lstrip())
line = line[:-2]
line = line.rstrip()
intersect.seek(0, 0)
scores_lines=scores.readlines()
scores.seek(0, 0)
for i, line_intersect in enumerate(intersect):
line_intersect= line_intersect.rstrip()
if line == line_intersect:
print("**The 2 Gram is: " + line.strip() + "\t The score is: " + scores_lines[i] +
'The number is ' + str(number))
count += 1
if count >= num_lines:
break
intersect.close()
out_file.close()
scores.close()
Slicing and striping
From:
'(850,·900,\t12·'
'(frequencies·850,\t4·'
'phone·but\t2·'
#\t denotes tabulation, · denotes spaces
Using:
line = line.rstrip()
Makes:
'(850,·900,\t12'
'(frequencies·850,\t4'
'phone·but\t2'
Then get the number:
number = line[-2:]
Gives:
'12'
'\t4'
'\t2'
Then left striping the number:
number = int(number.lstrip())
Gives:
12
4
2
Continuing with our "line":
'(850,·900,\t12'
'(frequencies·850,\t4'
'phone·but\t2'
Using
line = line[:-2]
line = line.rstrip()
Gives:
'(850, 900,'
'(frequencies 850,'
'phone but'
A bit harcoded, but avoid the necessity of using RegEx
Output
**The 2 Gram is: (850, 900, The score is: 0.857143
The number is 12
**The 2 Gram is: (Bands 4 The score is: 0.4
The number is 2
**The 2 Gram is: (frequencies 850, The score is: 1
The number is 4
**The 2 Gram is: 1, 3, The score is: 1
The number is 8
**The 2 Gram is: 13, 25) The score is: 0.666667
The number is 2
**The 2 Gram is: 1800, 1900 The score is: 1
The number is 8
**The 2 Gram is: 1900, 2100 The score is: 1
The number is 10
**The 2 Gram is: 5 compatible The score is: 0.444444
The number is 2
**The 2 Gram is: A1428: UMTS/HSPA+/DC-HSDPA The score is: 0.5
The number is 2
**The 2 Gram is: A1429: UMTS/HSPA+/DC-HSDPA The score is: 0.4
The number is 2
**The 2 Gram is: Australia, Germany, The score is: 1
The number is 2
**The 2 Gram is: B (800, The score is: 1
The number is 2
**The 2 Gram is: Full specs The score is: 1
The number is 2
**The 2 Gram is: GSM model The score is: 0.428571
The number is 6
**The 2 Gram is: In deciding The score is: 1
The number is 2
**The 2 Gram is: KDDI network The score is: 0.5
The number is 2
**The 2 Gram is: South Korea). The score is: 1
The number is 2
**The 2 Gram is: UMTS/HSPA+/DC-HSDPA (850, The score is: 0.666667
The number is 6
**The 2 Gram is: US AT&T The score is: 1
The number is 2
**The 2 Gram is: US, along The score is: 1
The number is 2
**The 2 Gram is: bands 4 The score is: 0.4
The number is 2
**The 2 Gram is: bands, making The score is: 1
The number is 2
**The 2 Gram is: battery life The score is: 0.363636
The number is 2
**The 2 Gram is: blazing fast The score is: 1
The number is 2
**The 2 Gram is: didn't come The score is: 0.666667
The number is 3
**The 2 Gram is: fact that The score is: 0.4
The number is 3
**The 2 Gram is: iPhone 5 The score is: 0.526316
The number is 5
**The 2 Gram is: meet compatibility The score is: 1
The number is 2
**The 2 Gram is: model A1429: The score is: 0.5
The number is 4
**The 2 Gram is: networks in The score is: 0.258065
The number is 4
**The 2 Gram is: networks. However, The score is: 1
The number is 2
**The 2 Gram is: one GSM. The score is: 0.363636
The number is 2
**The 2 Gram is: phone but The score is: 0.1
The number is 2
**The 2 Gram is: phone. This The score is: 0.444444
The number is 2
**The 2 Gram is: release three The score is: 0.8
The number is 2
**The 2 Gram is: sim card The score is: 0.8
The number is 2
**The 2 Gram is: standards worldwide. The score is: 1
The number is 2
**The 2 Gram is: support LTE The score is: 0.296296
The number is 4
**The 2 Gram is: the phone The score is: 0.188679
The number is 10
**The 2 Gram is: to my The score is: 0.12
The number is 3
**The 2 Gram is: works great The score is: 0.4
The number is 2
Ideas to take home:
- Be aware of whitespaces, rstrip is you ally.
- Using f1, f2 and f3 is intuitive, but in the long run you get confuse. Use meaningful names!
Hope it helps!