I am using SWI-Prolog.
I have a csv file where the top row are probes and then each row is a sample:
1007_s_at 1053_at 117_at ...
GSM102447.CEL 1 0 0 ...
GSM102449.CEL 1 0 0 ...
GSM102451.CEL 1 0 0 ...
GSM102455.CEL 1 0 0 ...
GSM102507.CEL 1 0 1 ...
...
The actual file has over 20,000 columns ('probes') and no more than 150 rows ('samples').
I want to extract each relation and print them as facts in another file.
For example:
%probe_value_in_sample(Probe,Sample_Strip,ProbeValue).
probe_value_in_sample('1007_s_at','GSM102447',1).
etc
My code so far:
foreach(csv_read_file_row_list('GSE2109_BarCode.csv', List), assert(['samples'|List])).
probe_value_in_sample(Probe,Sample_Strip,ProbeValue):-
[samples|[samples,Empty|ProbeList]],Empty='', %the first value is empty
indexOf(ProbeList,Probe,IndexOfProbe),
[samples|[samples,Sample|SampleValues]],Sample\='',
nth0(IndexOfProbe,SampleValues,ProbeValue),
name(Sample, CharSample),
append(Char_Sample_Strip,".CEL",CharSample),
name(Sample_Strip,Char_Sample_Strip).
%IndexOf(MyList, MyElement, MyIndex).
indexOf([Element|_], Element, 0).
indexOf([_|Tail], Element, Index):-
indexOf(Tail, Element, Index1),
Index is Index1+1.
This seems to work well, but does not work or it is so slow to be unusable with a findall.
Any idea what could be the problem?
Thank you for any help.
Update
Thank you for your reply.
I have defined:
csv_read_file_row_list(File, List,Functor):-
csv_read_file_row(File,Row,[functor(Functor)]),Row=..List.
So I have an open file rather than a stream and the Functor variable is superfluous at the moment.
I am confused by how you have used maplist? And I can't quite get it to work.
I have tried:
:- dynamic samples/3.
csv_read_file_row_list(File, List,Functor):-
csv_read_file_row(File,Row,[functor(Functor)]),Row=..List.
prepare_db(File) :-
( nonvar(File) ; File = 'GSE2109_BarCode.csv' ),
%open(File, read, S),
csv_read_file_row_list(File, ['thing',_Empty|ColKeys],'thing'),
forall(csv_read_file_row_list(File, ['thing',RowKeyDirty|Samples],'thing'),
( clean_rowkey(RowKeyDirty, RowKey),
maplist(store_sample(RowKey), ColKeys, Samples)
)).
%close(S).
store_sample(RowKey, ColKey, Sample) :-
assertz(samples(RowKey, ColKey, Sample)).
clean_rowkey(RowKeyDirty, RowKey) :- append(RowKey, ".CEL", RowKeyDirty).
As well as:
:- dynamic samples/3.
csv_read_file_row_list(File, List,Functor):-
csv_read_file_row(File,Row,[functor(Functor)]),Row=..List.
prepare_db(File) :-
( nonvar(File) ; File = 'GSE2109_BarCode.csv' ),
%open(File, read, S),
csv_read_file_row_list(File, ['thing',_Empty|ColKeys],'thing'),
forall(csv_read_file_row_list(File, ['thing',RowKeyDirty|Samples],'thing'),
( clean_rowkey(RowKeyDirty, RowKey),
maplist(store_sample,[RowKey], ColKeys, Samples)
)).
%close(S).
store_sample(RowKey, ColKey, Sample) :-
assertz(samples(RowKey, ColKey, Sample)).
clean_rowkey(RowKeyDirty, RowKey) :- append(RowKey, ".CEL", RowKeyDirty).
But both fail.