Code to count 1st, 2nd and first-two digits from a text file

Question

I have been struggling to find a code that will give me the sum of the first digit for all (1,2,3,...,9) numbers individually from a text file. I also want to adapt/modify that code for the second and two-first digits. Is anyone able to help? What I have done so far is: For the first digit: fgrep -oE "[[:digit:]]{1,}" 'filename' | grep "^1"| wc -l For the second digit: fgrep -oE "[[:digit:]]{2,}" 'filename' | grep "^1"| wc -l

In order to get it for the other digits (2,3,....,9) I modify "^1" to e.g. "^2" and so on.... I am pretty sure the results I get for the second digit formula are definitely wrong.... I need urgend help, thanks! :)

And an example of your `Python` code. Have you tried regular expressions? — Inbar Rose, Sep 06 '15 at 05:45
Well to be honest all I did is putting above codes into my OS X terminal and let it run. I am not a computer guy whatsoever but need this to do some testing for my thesis. Would you be able to help? — Rashid Arain, Sep 06 '15 at 06:00

score 1 · Answer 1 · edited May 23 '17 at 10:26

Generally speaking, you can easily do something like this...

import re

def sum_nums_in_text_by_indices(text, indices=slice(0,1)):
    return sum(int(n[indices]) for n in re.findall('\d+', text))

Examples:

>>> sum_nums_in_text_by_indices('123 123')              # first digits
2

>>> sum_nums_in_text_by_indices('123 123', slice(1,2))  # second digits
4

>>> sum_nums_in_text_by_indices('123 123', slice(0,2))  # first and second digits
24

In order to use this properly, you should familiarize yourself with Pythons Slice Notation. (documentation: slice())

The function can be further simplified:

def sum_nums_in_text_by_indices(text, start=0, stop=1):
    return sum(int(n[start:stop]) for n in re.findall('\d+', text))


>>> sum_nums_in_text_by_indices('123 123')
2

>>> sum_nums_in_text_by_indices('123 123', start=1, stop=2)
4

>>> sum_nums_in_text_by_indices('123 123', start=0, stop=2)
24

Thanks for your swift reply! Does this also work if I have a text file of i.e. an annual statement with content also being words? Does it filter the numbers automatically, as I understood your code works only for text files that consist numbers only. — Rashid Arain, Sep 06 '15 at 06:11
It finds all sequences of numbers in text. Try it yourself and see. If you want more specific help, you should provide more specific information. — Inbar Rose, Sep 06 '15 at 06:51

AnnieFromTaiwan · Answer 2 · 2015-09-11T09:56:39.433

I'll just guess what you want.

Sample input file. (filepath: `/tmp/ggz`)

240872014
3406121147
131
115388201300032
13022020149210000854942
124342014
1148272013102002
11975281552961075898430474
240872014
118113201520150113164711178

Count the occurrence of digits in the first place of every line.

$ grep -oP "^[0-9]" /tmp/ggz | sort | uniq -c
  7 1       # Digit `1` has 7 occurrences.
  2 2       # Digit `2` has 2 occurrences.
  1 3       # Digit `3` has 1 occurrences.
            # No occurrences for digit `4` ~ `9` and `0`

Count the occurrence of numbers in the first-two place of every line.

$ grep -oP "^[0-9]{2}" /tmp/ggz | sort | uniq -c
  4 11      # Number `11` has 4 occurrences.
  1 12
  2 13
  2 24
  1 34

Count the occurrence of numbers in the second place of every line.

$ grep -oP "(?<=^.)[0-9]" /tmp/ggz | sort | uniq -c
  4 1       # Digit `1` has 4 occurrences.
  1 2
  2 3
  3 4

Code to count 1st, 2nd and first-two digits from a text file

2 Answers2

Sample input file. (filepath: /tmp/ggz)

Count the occurrence of digits in the first place of every line.

Count the occurrence of numbers in the first-two place of every line.

Count the occurrence of numbers in the second place of every line.

Sample input file. (filepath: `/tmp/ggz`)