2

I am reading a text file which contains some numbers and letters in each row.
The first number of each row is a unique ID, and I want to copy all the same IDs into a separate list.

For example, if my list after reading the file is something like this:

[
  ['507', 'W', '1000', '1'],
  ['1', 'M', '6', '2'],
  ['1', 'W', '1400', '3'],
  ['1', 'M', '8', '8'],
  ['1', 'T', '101', '10'],
  ['507', 'M', '4', '12'],
  ['1', 'W', '1700', '15'],
  ['1', 'M', '7', '16'],
  ['507', 'M', '8', '20'],
  ...
]

The expected output should be the following:

[
  ['507', 'W', '1000', '1','507', 'M', '4', '12','507', 'M', '8', '20'],
  ['1', 'M', '6', '2','1', 'M', '8', '8','1', 'T', '101', '10','1', 'W', '1700', '15','1', 'M', '7', '16']
  ...
]

and so on for all other unique IDs in file.

All the rows starting with "507" should be stored in a different list and the rows starting with "1" stored in another and so forth.

My current code:

import operator
fileName = '/home/salman/Desktop/input.txt'

lineList = []
first_number = []
common_number = []

with open(fileName) as f:
  for line in f:
    lineList = f.readlines()
    lineList.append(line)
    lineList = [line.rstrip('\n') for line in open(fileName)]
    first_number = [i.split()[0] for i in lineList]

print("Rows in list:" + str(lineList))
print("First number in list : " + str(first_number))
common_number = list(set(first_number))
print("Common Numbers in first number list : "+ str(common_number))
print("Repeated value and their index's are :")
Jaideep Shekhar
  • 808
  • 2
  • 7
  • 21
  • 4
    usually "Unique ID" means the numbers don't repeat. However in your example if the first number is repeating, hence it is not a unique ID. I recommend editing your question. Also share with us as to what approach you have tried so far! – The Next Programmer Feb 27 '20 at 16:10
  • @TheNextProgrammer – Logitech Flames Feb 27 '20 at 16:14
  • @LogitechFlames do you want the output as a list of lists? Or do you have a fixed set of IDs, what just want to set the variables? – Jaideep Shekhar Feb 27 '20 at 16:18
  • @JaideepShekhar no fix set of id the input file is quite large basically i want to copy all the same ID in to list so that i cant perform further calculation on them. – Logitech Flames Feb 27 '20 at 16:21

2 Answers2

2

Something like this:

rows = [['507', 'W', '1000', '1'],
['1', 'M', '6', '2'],
['1', 'W', '1400', '3'],
['1', 'M', '8', '8'],
['1', 'T', '101', '10'],
['507', 'M', '4', '12'],
['1', 'W', '1700', '15'],
['1', 'M', '7', '16'],
['507', 'M', '8', '20']]

merged = {}
for row in rows:
  if row[0] in merged:
    merged[row[0]].extend(row[1:])
  else:
    merged[row[0]] = row

print(merged)

Output:

{
'507': ['507', 'W', '1000', '1', 'M', '4', '12', 'M', '8', '20'], 
'1': ['1', 'M', '6', '2', 'W', '1400', '3', 'M', '8', '8', 'T', '101', '10', 'W', '1700', '15', 'M', '7', '16']
}

Or .extend(row) if you really want to repeat the ID

Cedric Druck
  • 1,032
  • 7
  • 20
  • is this possible to sum up all the third position after each M for each id and find the lowest value at third position after each W for each id? – Logitech Flames Feb 27 '20 at 16:35
2

This is my attempt. First please read this document on groupby: https://docs.python.org/3/library/itertools.html#itertools.groupby and how it is important to order your sequence first. Here your key is the first element of the lists so I order by that. sorted: https://docs.python.org/3/howto/sorting.html

Flatten a list of lists: How to make a flat list out of list of lists?

Explanation: Sort the elements so consecutive entries have the same key i.e. first element. When that key changes, then we know that all items with the previous key have been exhausted. So basically we need to find where the first element of consecutive entries change. That's what the groupby object provide. It gives a tuple of (key, group) where key would be the first element that identifies each group and group would be a generator of all lists with the same key (so a generator which really is just a list of lists). We unpack them and flatten them.

import itertools
lst = [
    ['507', 'W', '1000', '1'],
    ['1', 'M', '6', '2'],
    ['1', 'W', '1400', '3'],
    ['1', 'M', '8', '8'],
    ['1', 'T', '101', '10'],
    ['507', 'M', '4', '12'],
    ['1', 'W', '1700', '15'],
    ['1', 'M', '7', '16'],
    ['507', 'M', '8', '20']
]
lst = sorted(lst, key=lambda x: x[0])
groups = itertools.groupby(lst, key=lambda x: x[0])
groups = [[*group] for _, group in groups]

# 3rd element
grp_3rd = [[entry[2] for entry in group] for group in groups]

# you could sum it up right here
grp_3rd = [sum(float(entry[2]) for entry in group) for group in groups]

# or you could do to see each key and the corresponding sum i.e. {'1': 3222.0, '507': 1012.0}
grp_3rd = {group[0][0]: sum(float(entry[2]) for entry in group) for group in groups}

# continue on to your output
flatten = lambda list_: [sublist for l in list_ for sublist in l]
groups = [flatten(group) for group in groups]

output:

[['1', 'M', '6', '2', '1', 'W', '1400', '3', '1', 'M', '8', '8', '1', 'T', '101', '10', '1','W', '1700', '15', '1', 'M', '7', '16'],
 ['507', 'W', '1000', '1', '507', 'M', '4', '12', '507', 'M', '8', '20']]

The answer from Cedric below is easier to understand so if you can easily follow that here is how you could change it.

rows = [['507', 'W', '1000', '1'],
['1', 'M', '6', '2'],
['1', 'W', '1400', '3'],
['1', 'M', '8', '8'],
['1', 'T', '101', '10'],
['507', 'M', '4', '12'],
['1', 'W', '1700', '15'],
['1', 'M', '7', '16'],
['507', 'M', '8', '20']]

# get the output and sum directly
merged = {}
for row in rows:
    if row[0] not in merged:
        merged[row[0]] = [[], 0]
    merged[row[0]][0].extend(row[1:])
    merged[row[0]][1] += float(row[2])

# get the output and the list of 3rd elements
merged = {}
for row in rows:
    if row[0] not in merged:
        merged[row[0]] = ([], [])
    merged[row[0]][0].extend(row[1:])
    merged[row[0]][1].append(float(row[2]))
Buckeye14Guy
  • 831
  • 6
  • 12
  • is this possible to sum up all the third position after each M for each id and find the lowest value at third position after each W for each id? – Logitech Flames Feb 27 '20 at 16:34
  • just in case the answer from @Cedric below is easier to follow I added changes to his code that may help achieve what you want. thanks – Buckeye14Guy Feb 27 '20 at 17:10
  • how could i achieve this post question in your code [link]https://stackoverflow.com/questions/60445521/find-average-number-lowest-maximum-number-from-list – Logitech Flames Feb 28 '20 at 05:23