-1

I have a text file output with the following format:

Line[0]:    ('["\'AA\'"]', '["\'BB\'"]', '["\'CC\'"]')
Line[1]:    ('["\'XYZ\'"]', '["\'YY\'"]', '["\'ZZ\'"]')
Line[2]:    ('["\'PP\'"]', '["\'QQ\'"]', '["\'RR\'"]')
Line[3]:    ('["\'XYZ\'"]', '["\'YY\'"]', '["\'ZZ\'"]')
Line[4]:    ('["\'PP\'"]', '["\'QQ\'"]', '["\'RR\'"]')
Line[5]:    ('["\'PP\'"]', '["\'QQ\'"]', '["\'RR\'"]')
Line[6]:    ('["\'AA\'"]', '["\'BB\'"]', '["\'CC\'"]')
Line[7]:    ('["\'XYZ\'"]', '["\'YY\'"]', '["\'ZZ\'"]')

I would like to find the duplicate strings in parenthesis, count the repetitive ones and sort them in descending order eliminating the redundant ones. I tried looking in for similar posts that describe using the counter method, but was not able to use in this context. I would like my output to be as described below:

The Line[Num] are also part of the text file

Expected output:

Line[0]:    ('["\'XYZ\'"]', '["\'YY\'"]', '["\'ZZ\'"]') Count=3
Line[1]:    ('["\'PP\'"]', '["\'QQ\'"]', '["\'RR\'"]') Count =3
Line[2]:    ('["\'AA\'"]', '["\'BB\'"]', '["\'CC\'"]') Count= 2
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
juggernaut
  • 11
  • 4

1 Answers1

0

Split the input file on the newline character, then iterate through the lines in the file. If the text file is in the format you describe, it might be prudent to use the regex library to extract the strings inside the parenthesis. Store each line in a dictionary with the line as the key and the number of occurrences of the line as the value. When a new line is encountered (it's not in the dictionary yet), store it in the dictionary with a value of one. Otherwise, just increment the value at the dictionary key for the line.

At the end, do a dictionary sort by value and output the strings.

Community
  • 1
  • 1
The Velcromancer
  • 437
  • 3
  • 10
  • Iterating through the lines of a text file in python is simply : `with open(file) as f: for line in f: #do your stuff` (identation not respected because of comment format ...) – Serge Ballesta Jun 24 '14 at 19:25