0

Excel's column A has many rows with different values. Example:

613
613
613
625
625
631
631...etc

I want to iterate through column A and when the cell's value refers to a new value, obtain the cell's row.

Expected result example: When the cell's value changed to 625, it will return the row 4. When 631, 6, and so on.

This is my code so far:

from openpyxl import Workbook, load_workbook

wb = load_workbook("Workbook.xlsx")
ws = wb.active
sheet = wb["Visual Query2"]

for row in ws.iter_rows(min_row=1, max_col=1, max_row=223181):
    for cell in row::
           print(cell.row)

I got a bit lost when trying to catch when the cell starts having a different value. If someone can provide me guidance, I will really appreciate it. Thanks.

Edit: Was able to find a temporary solution utilizing:

wb = load_workbook("Workbook.xlsx")
ws = wb.active
sheet = wb["Sheet"]

for fila in ws.iter_rows('A2:A7'):
    for celda in fila:
        if celda.value != celda.offset(-1, 0).value:
            print(celda.row)
Patriots299
  • 365
  • 3
  • 15

3 Answers3

1

Perhaps the most straightforward approach is using zip, but this will require loading all the rows to memory and creating 2 lists, which may be tricky if you actually have 2 million rows:

li = [1, 1, 2, 2, 3, 3, 1, 1, 2, 1, 2, 3]
print([index + 1 for index, (a, b) in enumerate(zip(li[:-1], li[1:])) if a != b])
#  [2, 4, 6, 8, 9, 10, 11]

A more conservative way will be to "cache" the last value:

li = [1, 1, 2, 2, 3, 3, 1, 1, 2, 1, 2, 3]
previous = None
for index, n in enumerate(li):
    if previous is None:
        previous = n
        continue
    if previous != n:
        print(index)
    previous = n

outputs

2
4
6
8
9
10
11
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • It is a good approach. It does not really resolve the question because it does not implement OpenPyXl, but I have thumbed it up. I found an alternative that I have added as an edit. If you want to update your code, and it works, I will be marking your answer as the right one. – Patriots299 Jan 03 '19 at 03:04
1

In order to keep track of when the value changed, you can use a variable to record the last value:

def return_first_of_kind(iterable, key):
    previous_value = None
    for value in iterable:
        if key(value) != previous_value:
            yield value
        previous_value = key(value)

The you use it as:

def cell_a_value(row):
    return row[0].value

for row in return_first_of_kind(ws.iter_rows(), key=cell_a_value):
    print(row)

I forgot how to get the value of the first column from an openpyxl worksheet row, please update the cell_a_value function as needed.

Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • It is a good approach, but does not entirely resolve the question. Therefore I have just thumbed your post up. If you want to edit, and it works well, I will mark your answer as the right one. For now, I have found a temporary solution that I have added as an edit. – Patriots299 Jan 03 '19 at 03:07
-2

If you want to change as little as possible in your current code, I'd suggest storing the last value in some kind of temporary variable that you write over each time you move to the next row. You compare the current row's country code to the previous row's country code and, if they're different, you print the index of the row. If the index of the row isn't easily accessible, you can implement your own counter variable that tracks which index you're on. There may be more elegant solutions available, but these will work.

Here's an example of the implementation I described above:

i = 0
temp = None
for row in ws.iter_rows(min_row=1,max_col=1,max_row=223181):
    for cell in row:
        if cell.value != temp:
            print(i)
        temp = cell.value
    i += 1

You may want to come up with a different way to assign the temporary variable the first time. This will print the first value no matter what.

The highest upvoted solution here provides a more elegant solution for the issue of printing the index issue. Additionally, this documentation tells us there's a cleaner way of opening and reading the file (see the "Read-only" section).

tkiral
  • 49
  • 1
  • 9
  • Although I think the answer is technically correct a small code sample would make it way easier to understand. – Paulo Scardine Jan 03 '19 at 00:33
  • I generally don't do that to prevent undergraduate copy-pasting ;). I also think it'd be more helpful for learners to have to implement it themselves. I can add that in though. – tkiral Jan 03 '19 at 00:41
  • I appreciate you editing the post and providing more information. The problem with your code is that it will not differentiate whether there is a new value, but will only catch if the cell value is not blank. – Patriots299 Jan 03 '19 at 02:03
  • @FranJ Oops; I didn't include the temp update.....adding that in now. Based on the original explanation that was present, though, you'd know that this is updated through iteration and that I was missing that from my code – tkiral Jan 03 '19 at 17:33