I wrote a solution to this challenge based on this answer. It successfully handles the example case given, but not the actual case.
Challenge:
Given two strings
s
andt
,t
is a substring ofs
ift
is contained as a contiguous collection of symbols ins
(as a result,t
must be no longer thans
).The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position
i
ofs
is denoted bys[i]
.A substring of
s
can be represented ass[j:k]
, wherej
andk
represent the starting and ending positions of the substring ins
; for example, ifs = "AUGCUUCAGAAAGGUCUUACG"
, thens[2:5] = "UGCU"
.The location of a substring
s[j:k]
is its beginning positionj
; note thatt
will have multiple locations ins
if it occurs more than once as a substring ofs
(see the Sample below).
Given:
Two DNA strings
s
andt
(each of length at most 1 kbp).
Return:
All locations of
t
as a substring ofs
.
Sample Dataset:
GATATATGCATATACTT
ATAT
Sample Output:
2 4 10
For the sample, it works. Sure, you need to trim the formatting by hand, but that's only a couple of seconds of work.
The actual data and my generated output are not accepted.
Actual Dataset:
CAAATAGTCACACAATAGTCGGCTAAATAGTCAATAGTCAAATAGTCAGAGCTAATAGTCTAAATAGTCGAAAAATAGTCATCAATAGTCTAAATAGTCAATAGTCGGAATAGTCAAAATAGTCAATAGTCAATAGTCAATAGTCGACTAAATAGTCCCAATAGTCTCAGAAATAGTCAATAGTCGTAATAGTCAATAGTCTAATAGTCTAATAGTCCAATAGTCTGTCAAATAGTCAATAGTCCAATAGTCGTTTAATAGTCCCCTTTACCAATAGTCAATAGTCCGAATAGTCAGGAATAGTCAGCACTAATAGTCAATAGTCCTAATAGTCCCAATAGTCAAAATAGTCAATAGTCTAAATAATAGTCCTAGCAGAAGAATAGTCTAATAGTCGGCAATAGTCAATAGTCAAATAGTCAGAATAGTCAAATAGTCGAAATAGTCAATAGTCAATAGTCAAATAGTCAAATAGTCAATAGTCAAATAGTCAAATAGTCAAATAGTCGAATAGTCTGTAATAGTCAATAGTCCTTCAATAGTCTAATAGTCATTCAATAGTCAAGAAATAGTCGGGGGAATAGTCCGAATAGTCAAATAGTCAATAGTCGAATAGTCTAATAGTCAATAGTCTAATAGTCTGATAATAGTCAAATAGTCAATAGTCTAAATAGTCGCCTATGCCAATAGTCTTATCAAATAGTCTCTTAATAGTCTAATAGTCAATAGTCAATAGTCTAATAGTCATAATAGTCAATAGTCAAGGAATAGTCCCATAATAGTCAATAGTCTTAATAGTCCAAACGAAATAGTCTTAATAGTCCCTAATAGTCACTAATAGTCGTAATAGTCATAATAGTCCAATAGTCTAAATAGTCTGCAATAGTCAAATAGTCAAATAGTCCGTACAATAGTCTTAATAGTCTTTGCGGCTCAATAGTCTCATAATAGTC
AATAGTCAA
Actual Output (trimmed):
26 33 93 109 118 125 132
Code:
def find_substring_locations(long, short)
mpos = []
re = Regexp.new(short)
m = i = 0
m = re.match( long, i ) { |k| j = k.begin(0); i = j + 1; mpos << j } while m
return mpos
end
def plus_one(input)
arr = []
for i in input
arr << (i += 1)
end
arr
end
main_string = gets.chomp
sub_string = gets.chomp
plus_one(find_substring_locations(main_string, sub_string))
Where did I go wrong? It appears to be in order.
EDIT: Turns out the problem was a hiccup in the environment. After a reboot the problem could not be reproduced.