I am writing an algorithm to process a string, and it crashes (possibly due to a quirk in Prolog which makes string-intensive algorithms crash). How can I modify the algorithm so that it doesn't crash?
The algorithm replaces "
, “
, ”
, ‘
and ’
with '
, and \\
and -
with nothing and breaks the string on \n\n
.
It takes the inputted files:
raw_sources
:
1.txt
:
a
a
B
b
b
C
c
c
`2.txt`:
“”‘’'"
\\
-
b
And outputs the files:
sources
:
1.txt
:
["a
a","B
b
b","C
c
c"]
`2.txt`:
["''''''","
b"]
The query:
sheet_feeder(_).
The code so far:
sheet_feeder(T) :-
directory_files("raw_sources/",F),
delete_invisibles_etc(F,G),
findall(K1,(member(H,G),
atom_concat('raw_sources/',H,String00b),
phrase_from_file(string(String001), String00b),
string_codes(String000,String001),
string_concat(String000,"\n\n",String00_a),
strip_illegal_chars(String00_a,"",String00),
split_on_substring(String00,"\n\n",[],J1),
delete(J1,"",K1),
term_to_atom(K1,K),
string_concat("sources/",H,String00bb),
(open(String00bb,write,Stream1),
write(Stream1,K),
close(Stream1))
),T).
delete_invisibles_etc(F,G) :-
findall(J,(member(H,F),
atom_string(H,J),
not(J="."),not(J=".."),not(string_concat(".",_,J))),G).
string(String) --> list(String).
list([]) --> [].
list([L|Ls]) --> [L], list(Ls).
strip_illegal_chars("",A,A) :- !.
strip_illegal_chars(A,B,E) :-
string_concat(E1,D,A),
string_length(E1,1),
E1="\\",
string_concat(B,"",F),
strip_illegal_chars(D,F,E),!.
strip_illegal_chars(A,B,E) :-
string_concat(E1,D,A),
string_length(E1,2),
E1="- ",
string_concat(B,"",F),
strip_illegal_chars(D,F,E),!.
strip_illegal_chars(A,B,E) :-
string_concat(E1,D,A),
string_length(E1,1),
((E1="\"" -> true;
(E1="“" -> true;
(E1="”" -> true;
(E1="‘" -> true;
(E1="’" -> true;
(E1="'"))))))),
string_concat(B,"'",F),
strip_illegal_chars(D,F,E),!.
strip_illegal_chars(A,B,E) :-
string_concat(C,D,A),
string_length(C,1),
string_concat(B,C,F),
strip_illegal_chars(D,F,E),!.
split_on_substring([],_A,E,[E]) :- !. %% ***?
split_on_substring(A,B,E,C) :-
append(B,D,A),
split_on_substring(D,B,[],C1),
string_codes(E1,E),
append([E1],C1,C),!.
split_on_substring(A,B,E1,C) :-
length(E,1),
append(E,D,A),
append(E1,E,E2),
split_on_substring(D,B,E2,C),!.