0

I have been struggling to find a code that will give me the sum of the first digit for all (1,2,3,...,9) numbers individually from a text file. I also want to adapt/modify that code for the second and two-first digits. Is anyone able to help? What I have done so far is: For the first digit: fgrep -oE "[[:digit:]]{1,}" 'filename' | grep "^1"| wc -l For the second digit: fgrep -oE "[[:digit:]]{2,}" 'filename' | grep "^1"| wc -l

In order to get it for the other digits (2,3,....,9) I modify "^1" to e.g. "^2" and so on.... I am pretty sure the results I get for the second digit formula are definitely wrong.... I need urgend help, thanks! :)

2 Answers2

1

Generally speaking, you can easily do something like this...

import re

def sum_nums_in_text_by_indices(text, indices=slice(0,1)):
    return sum(int(n[indices]) for n in re.findall('\d+', text))

Examples:

>>> sum_nums_in_text_by_indices('123 123')              # first digits
2

>>> sum_nums_in_text_by_indices('123 123', slice(1,2))  # second digits
4

>>> sum_nums_in_text_by_indices('123 123', slice(0,2))  # first and second digits
24

In order to use this properly, you should familiarize yourself with Pythons Slice Notation. (documentation: slice())

The function can be further simplified:

def sum_nums_in_text_by_indices(text, start=0, stop=1):
    return sum(int(n[start:stop]) for n in re.findall('\d+', text))


>>> sum_nums_in_text_by_indices('123 123')
2

>>> sum_nums_in_text_by_indices('123 123', start=1, stop=2)
4

>>> sum_nums_in_text_by_indices('123 123', start=0, stop=2)
24
Community
  • 1
  • 1
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
  • Thanks for your swift reply! Does this also work if I have a text file of i.e. an annual statement with content also being words? Does it filter the numbers automatically, as I understood your code works only for text files that consist numbers only. – Rashid Arain Sep 06 '15 at 06:11
  • It finds all sequences of numbers in text. Try it yourself and see. If you want more specific help, you should provide more specific information. – Inbar Rose Sep 06 '15 at 06:51
0

I'll just guess what you want.

Sample input file. (filepath: /tmp/ggz)

240872014
3406121147
131
115388201300032
13022020149210000854942
124342014
1148272013102002
11975281552961075898430474
240872014
118113201520150113164711178

Count the occurrence of digits in the first place of every line.

$ grep -oP "^[0-9]" /tmp/ggz | sort | uniq -c
  7 1       # Digit `1` has 7 occurrences.
  2 2       # Digit `2` has 2 occurrences.
  1 3       # Digit `3` has 1 occurrences.
            # No occurrences for digit `4` ~ `9` and `0`

Count the occurrence of numbers in the first-two place of every line.

$ grep -oP "^[0-9]{2}" /tmp/ggz | sort | uniq -c
  4 11      # Number `11` has 4 occurrences.
  1 12
  2 13
  2 24
  1 34

Count the occurrence of numbers in the second place of every line.

$ grep -oP "(?<=^.)[0-9]" /tmp/ggz | sort | uniq -c
  4 1       # Digit `1` has 4 occurrences.
  1 2
  2 3
  3 4
AnnieFromTaiwan
  • 3,845
  • 3
  • 22
  • 38