I have a problem, didn't know how to create a matrix
I have a dictionary of this type:
dico = {
"banana": "sp_345",
"apple": "ap_456",
"pear": "pe_645",
}
and a file like that:
sp_345_4567 pe_645_4567876 ap_456_45678 pe_645_4556789
sp_345_567 pe_645_45678
pe_645_45678 ap_456_345678
sp_345_56789 ap_456_345
pe_645_45678 ap_456_345678
sp_345_56789 ap_456_345
s45678 f45678 f456789 ap_456_52546135
What I want to do is to create a matrix where we find more than n times a value from the dictionary in the line.
This is how I want to proceed:
step 1 create a dictionary with the associated values and numbers of lines :
Like that:
dictionary = {'1': 'sp_345_4567','pe_645_4567876', 'ap_456_45678', 'pe_645_4556789'; '2': 'sp_345_567', 'pe_645_45678'; '3:' 'pe_645_45678','ap_456_345678'; '4:' etc ..
Then I want to make a comparison between the values with my first dictionary called dico and see for example in the number of times the banana key appears in each line (and therefore do it for all the keys of my dictionary) except that the problem is that the values of my dico are not equal to those of my dictionary because they are followed by this pattern'_\w+''
The idea would be to make a final_dict that would look like this to be able to make a matrix at the end:
final_dict = {'line1': 'Banana' : '1' ; 'Apple': '1'; 'Pear':2; 'line2': etc ...
Here is my code that don't work :
import pprint
import re
import csv
dico = {
"banana": "sp_345",
"apple": "ap_456",
"pear": "pe_645",
}
dictionary = {}
final_dict = {}
cnt = 0
with open("test.txt") as file :
reader = csv.reader(file, delimiter ='\t')
for li in reader:
grp = li
number = 1
for li in reader:
dictionary[number] = grp
number += 1
pprint.pprint(dictionary)
number_fruit = {}
for key1, val1 in dico.items():
for key2, val2 in dictionary.items():
if val1 == val2+'_\w+':
final_dict[key1] = val2
Thanks for the help
EDIT :
I've tried using a dict comprehension
import csv
import re
dico = {
"banana": "sp_345",
"apple": "ap_456",
"pear": "pe_645",
}
with open("test.txt") as file :
reader = csv.reader(file, delimiter ='\t')
for li in reader:
pattern = re.search(dico["banana"]+"_\w+", str(li))
if pattern:
final_dict = {"line" + str(index + 1):{key:line.count(text) for key, text in dico.items()} for index, line in enumerate(reader)}
print(final_dict)
But when I print my final dictionary, it only put 0 for banana ...
{'line1': {'banana': 0, 'apple': 0, 'pear': 0}, 'line2': {'banana': 0, 'apple': 0, 'pear': 0}, 'line3': {'banana': 0, 'apple': 0, 'pear': 0}, 'line4': {'banana': 0, 'apple': 0, 'pear': 0}, 'line5': {'banana': 0, 'apple': 0, 'pear': 0}, 'line6': {'banana': 0, 'apple': 0, 'pear': 0}}
So yeah, now it looks like a bit more of what I wanted but the occurences doesn't rise .... :/ Maybe my condition should be inside the dict comprehension ??