How can I convert an CP1252 string into list of characters in numpy

Question

so i have a problem, I'm doing a class project where we need to implement so me backtracking to solve a crossword, given a dictionary of words of different sizes, so the problem is I want to implement a fast way to acces to these words of dictionary, so I use a dictionary of numpy, where the keys are the length of the words and as value, there's a numpy array of the words.

for now i have a structure similar to :

words[4] :{['ABBS','BACK',...,'zeal']}

but I'm looking for a structure that i can acces to every single characters of words so its easier for me to sort the words from dictionary, the structure I want to have is like:

words[4]:{['A','B','B','S'],['B','A','C','K'], ...,['Z','E','A','L']}

I have been trying many methods convert the string into a list of characters but it's not working.

what I'm doing is:

dictionaryF = open(self.dictionaryFile,'r')
dictionary = {}
words = dictionaryF.readlines()
dictionaryF.close()

length = np.array([len(i) for i in words]) 




for i in self.maxLen:
    dictionary[i] = np.array( words[np.where(length==i)])

self.dictionary = dictionary

self.maxLen is a list with the unique lengths of the words in crossword, I use it to not a have innecessary words.

self.dictionary is ths dictionary for solving which i will save ina attribute of my class.

I need a fast way to read and process all this so it doesn't take hours to read a file, because i need to read up to 600 thousand words sometimes.

all this is to slve faster the crossword, suppose u have a crossword like:

0   0   0   #   0   0
0   #   0   0   0   0
0   #   0   #   0   0
0   0   0   0   #   0
0   #   0   #   0   0
0   0   0   0   0   #
#   0   #   0   0   #

0`s are the empty positions where you can assign a word, so as you can see there are some position where two words intersect, and for these i need to sort out from dictionary of wordsthe one with same value of the assigned word on intersection, for example:

on first row the first three positions u can assign ABS, if u wanna assign a word on first column, it needs to start by A.

Why do you need the string to be a list? What feature of lists does the string not already support that you need? — Mad Physicist, Oct 11 '19 at 20:59
On top of what @MadPhysicist has asked, could you provide more context/background on this? This seems like a pretty basic case of the [XY problem](http://xyproblem.info). How are you actually solving the crossword, once you know which words could fit? What kind of data structure are you using for the crossword puzzle grid? Furthermore, the topic of crossword puzzles has come up multiple before: https://stackoverflow.com/questions/44606961/solving-crosswords, https://stackoverflow.com/questions/2288901/best-data-structure-for-crossword-puzzle-search — AMC, Oct 13 '19 at 05:14
Another link: https://stackoverflow.com/questions/45978894/empty-crossword-solver-in-python — AMC, Oct 13 '19 at 05:18
so to browse easier adn effectively the words from dictionary i will use backtracking with forward checking, so in this case i do forwardchecking by having the intersection values and sorting the words by intersection, lets say, i have assigned a word 'ABBS' and this word has an intersection with another word on position 1(first B), so to assign this next word i directly take words from dictionary which has the length fo that word and has the value B on the intersection position. — Panda.V5, Oct 14 '19 at 09:38
I got it converted to list of chars already but it still takes like 3 to 4 seconds to process 600K words, i dont know if there's any way to do it faster. i use: aux = np.loadtxt(self.dictionaryFile,dtype = str) length = np.array([len(i) for i in aux]) for size in self.maxLen: aux2 = aux[np.where(length == size)] z = [] for i in aux2: z.append(list(i)) self.dictionary[size] = np.array(z) @MadPhysicist — Panda.V5, Oct 14 '19 at 09:39

How can I convert an CP1252 string into list of characters in numpy

0 Answers0