How to find the total number of occurrences of the characters of a word in a string?

Question

I'm new to Python, and I would like to find a substring in a string.

For example, if I have a substring of some constant letters such as:

substring = 'sdkj'

And a string of some letters such as:

string = 'sdjskjhdvsnea'

I want to make a counter so that any letters S, D, K, and J found in the string the counter will get incremented by 1. For example, for the above example, the counter will be 8.

How can I achieve this?

Please update your question so it's clear what you really want, because am your question and the accepted answer don't match — Tim, Jun 22 '14 at 15:09
@TimCastelijns the second part of the accepted answe works perfectly for the question as for the first part its for finding a whole substring in a substring — Hakar, Jun 22 '14 at 17:28

score 2 · Accepted Answer · edited Jun 22 '14 at 15:07

2

May this code can help you:

>>> string = 'sdjskjhdvsnea'
>>> substring = 'sdkj'
>>> counter = 0
>>> for x in string:
...     if x in substring:
...         counter += 1


>>> counter
8
>>>

edited Jun 22 '14 at 15:07

Tim

41,901
18
127
145

answered Jun 22 '14 at 11:29

Tok Soegiharto

329
1
8

Just to clarify! The "if x in substring:" is inside the "for x in string:"-loop. Kind of hard to see. – Willy Jun 22 '14 at 11:31
Yes right, if x ... is inside for x in string: loop. – Tok Soegiharto Jun 22 '14 at 11:35
@hakar, just want to know if this is a right answer, if so feel free to mark it as a correct answer, otherwise, i can improve the answer. Thanks. – Tok Soegiharto Jun 22 '14 at 11:36
oh, thank you alot, it really worked, but what if we want to find the whole substring in the string for example if the string is string = 'sdkjhsgshfsdkj' so the counter is equal to 2 in this case?? – Hakar Jun 22 '14 at 11:36
2

@Hakar that is a totally different question, and (per my answer) what is usually meant by *"finding a substring"*. – jonrsharpe Jun 22 '14 at 11:38
@jonrsharpe so how to do that one, can you tell me please? i mean the second question – Hakar Jun 22 '14 at 11:42
@Hakar at the risk of self-promotion, why not read my answer? – jonrsharpe Jun 22 '14 at 11:43
@TokSoegiharto note that this approach is `O(len(string)*len(substring))`, so will not be efficient if those strings get larger. – jonrsharpe Jun 22 '14 at 11:48
@jonrsharpe excuse me? which comment bro? did you comment the code for finding an entire substring in a string? i haven't seen it – Hakar Jun 22 '14 at 11:49
@TokSoegiharto no problem - it's unlikely to matter in this trivial case, but something to be aware of. – jonrsharpe Jun 22 '14 at 11:51
@jonrsharpe, I just want to make a help. Thanks again. – Tok Soegiharto Jun 22 '14 at 11:53

jonrsharpe · Answer 2 · 2014-06-22T11:46:38.940

1

Edit:

As you apparently do want the count of the appearances of the whole four-character substring, regex is probably the easiest method:

>>> import re
>>> string = 'sdkjhsgshfsdkj'
>>> substring = 'sdkj'
>>> len(re.findall(substring, string))
2

re.findall will give you a list of all (non-overlapping) appearances of substring in string:

>>> re.findall('sdkj', 'sdkjhsgshfsdkj')
['sdkj', 'sdkj']

Normally, "finding a sub-string 'sdkj'" would mean trying to locate the appearances of that complete four-character substring within the larger string. In this case, it appears that you simply want the sum of the counts of those four letters:

sum(string.count(c) for c in substring)

Or, more efficiently, use collections.Counter:

from collections import Counter

counts = Counter(string)
sum(counts.get(c, 0) for c in substring)

This only iterates over string once, rather than once for each c in substring, so is O(m+n) rather than O(m*n) (where m == len(string) and n == len(substring)).

In action:

>>> string = "sdjskjhdvsnea"
>>> substring = "sdkj"
>>> sum(string.count(c) for c in substring)
8
>>> from collections import Counter
>>> counts = Counter(string)
>>> sum(counts.get(c, 0) for c in substring)
8

Note that you may want set(substring) to avoid double-counting:

>>> sum(string.count(c) for c in "sdjks")
11
>>> sum(string.count(c) for c in set("sdjks"))
8

edited Jun 22 '14 at 11:46

answered Jun 22 '14 at 11:32

jonrsharpe

115,751
26
228
437

import re >>> string = 'sdkjhsgshfsdkj' >>> substring = 'sdkj' >>> len(re.findall(substring, string)) 2 this one is great, but how to save the value in a variable "counter" in this case?? – Hakar Jun 22 '14 at 11:57
@Hakar uh... `counter = len(...)`?! – jonrsharpe Jun 22 '14 at 11:58
yes I fixed that in anotherway, but there is a problem: what is the substring starts and ends with the same letter, lemme explain it in an example substring = 'sdks' string = 'sdksjhgsdksdks' – Hakar Jun 22 '14 at 12:02
@Hakar Per the documentation I have already linked to, `re.findall` is **non-overlapping**. If you have overlapping substrings, consider a [moving window approach](http://stackoverflow.com/q/6822725/3001761) or use [`re.match`](http://stackoverflow.com/q/5616822/3001761). – jonrsharpe Jun 22 '14 at 12:04
i think my case is not overlapping as i looked at the links u gave, what i want if the the last letter of the substring is same as the last letter and in the string we have a concatination of of the substring but with the same letter in common, for example: substring = 'sdks' string = 'sdksjhgsdksdkshjhsdks' so the counter will be three in this case as there is two sdks and an sdksdks which will be treated as two not one because the S in the middle will be the last letter of the first one and the first letter of the second one – Hakar Jun 22 '14 at 17:47
@Hakar you realised that what you just described is overlapping, right? Also, it's "concatenation". – jonrsharpe Jun 22 '14 at 17:50

score 1 · Answer 3 · answered Jun 22 '14 at 11:36

1

An alternative solution using re.findall():

>>> import re
>>> substring = 'sdkj'
>>> string = 'sdjskjhdvsnea'
>>> len(re.findall('|'.join(list(substring)), string))
8

answered Jun 22 '14 at 11:36

Amal Murali

75,622
18
128
150

Did you really mean to include `// 8`? – jonrsharpe Jun 22 '14 at 11:37
@jonrsharpe: Erm, that was meant to be a comment. I should have used `#` instead. Anyway, updated! :) – Amal Murali Jun 22 '14 at 11:38

How to find the total number of occurrences of the characters of a word in a string?

3 Answers3