0

I want to create a dictionary from list of strings. For example I have these list

AAAA
AAAA
AAAA
BBBB
BBBB
CCCC
CCCC
CCCC
....

Then I want to create a dictionary with numbering value from that, how to do that?

I explored some code but still have no idea

import os
path = "directoryA"
dirList = os.listdir(path)


with open("check.txt", "w") as a:
    for path, subdirs, files in os.walk(path):
        for filename in files:
            # I have splitted the text and now I want to create dictionary 
            #from it

            mylist = filename.split("_") # the text format is AAAA_0 and I split 
                                         #it so I can have list of 'AAAA' and '0'

            k = mylist[0] #I only take 'AAAA' string after splitting
            print(k) # here the output only give text output. From this I want to 
                     # put into dictionary            

This is the output after print(k) and these are not list

AAAA
AAAA
AAAA
BBBB
BBBB
CCCC
CCCC
CCCC
....

This is my expected result

myDic ={
    'AAAA': 0,
    'BBBB': 1,
    'CCCC': 2,
    'DDDD': 3,
    # ... and so on
}

6 Answers6

2

Assuming the contents of check.txt looks like li, start by getting all unique elements in your list of strings by using a set, and then sort the unique list alphabetically

After that, use dictionary comprehension and enumerate to generate your dictionary

li = [
    "AAAA",
    "AAAA",
    "AAAA",
    "BBBB",
    "BBBB",
    "CCCC",
    "CCCC",
    "CCCC"]

#Get the list of unique strings by converting to a set
li = (list(set(li)))

#Sort the list lexicographically
li = sorted(li)

#Create your dictionary via dictionary comprehension and enumerate
dct =  {item:idx for idx, item in enumerate(li)}
print(dct)

The output will be

{'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

We should be able to create the list of strings li like so

import os
path = "directoryA"
dirList = os.listdir(path)
li = []

with open("check.txt", "w") as a:
    for path, subdirs, files in os.walk(path):
        for filename in files:
            # I have splitted the text and now I want to create dictionary 
            #from it

            mylist = filename.split("_") # the text format is AAAA_0 and I split 
                                         #it so I can have list of 'AAAA' and '0'

            k = mylist[0] 
            #append item to li
            li.append(k) 
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
  • I tried this but the result like this {'AAAA': 0}, {'AAAA': 0}, {'BBBB': 0}, {'CCCC': 0}. What I want is to put the same string into same number, and increament the number if the string is different than the previous string – Bob Adi Setiawan May 12 '19 at 07:37
  • Is your input string the way I have defined it? What input string are you taking, is it different that what I took from your example @BobAdiSetiawan ? – Devesh Kumar Singh May 12 '19 at 07:38
  • I don't think so, because after I check the output, it gave multiple text output, for example AAAA, AAAA, BBBB, BBBB, CCCC, CCCC, etc. From these output, I want to create dictionary – Bob Adi Setiawan May 12 '19 at 07:45
  • Is it a list of strings? Check my updated answer! @BobAdiSetiawan ! – Devesh Kumar Singh May 12 '19 at 07:55
  • The output are still not a list of strings. It just output a text after I did the splitting part – Bob Adi Setiawan May 12 '19 at 07:57
  • Then what is the input? A string? A list of strings? I provided both examples to you! Can you please add more clarification in the question! Also as I said! The code I provided is independent of what you are trying to do overall outside this code! You can take this and implement it whichever way you like @BobAdiSetiawan :) – Devesh Kumar Singh May 12 '19 at 07:59
  • The input is a text from splitting text. For example AAAA_0, AAAA_1, BBBB_0, CCCC_0; then after splitting and I print out the text, it will have this output text AAAA, AAAA, BBBB, CCCC. From this text, I want to put into dictionary – Bob Adi Setiawan May 12 '19 at 08:02
  • Okay then append all this output `AAAA`, `BBBB` etc into a big list (so `li = ["AAAA", "BBBB"..]` ),, which will look like what I did in my example, and from there the solution I provided would work @BobAdiSetiawan – Devesh Kumar Singh May 12 '19 at 08:04
  • I tried to append the output to list but have result in [AAAA] [AAAA, BBBB] [AAAA, BBBB, CCCC] which has different line. So how to take only [AAAA, BBBB, CCCC] list? – Bob Adi Setiawan May 12 '19 at 09:09
  • Can you add whatever code you are trying to the question so that I can see what is going on @BobAdiSetiawan – Devesh Kumar Singh May 12 '19 at 09:28
  • I tried to append also but i found that it will be repeated with different line – Bob Adi Setiawan May 12 '19 at 09:58
  • Just make a global list `li`, and append `k` to that list, then process according to what I wrote in the question @BobAdiSetiawan – Devesh Kumar Singh May 12 '19 at 10:50
  • Great, Glad to help! If the answer helped you, please consider marking my answer as accepted if it helped you :) @BobAdiSetiawan Also consider taking a look at https://stackoverflow.com/help/someone-answers – Devesh Kumar Singh May 12 '19 at 12:17
  • Thanks for the accept @BobAdiSetiawan Have a great day :) – Devesh Kumar Singh May 12 '19 at 12:58
1

You can use itertools.groupby to group the strings assuming they are sorted as you have them (it not, sort them first). Then enumerate() over the groups which will give you the count:

from itertools import groupby
l = [
    "AAAA", 
    "AAAA", 
    "AAAA", 
    "BBBB",
    "BBBB",
    "CCCC",
    "CCCC",
    "CCCC"]

d = {key:i for i, (key, group) in enumerate(groupby(l))}
# {'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

If you are reading from a file and the strings are not sorted, you can add an entry and increment each time you find something not yet in the dict. The values will be sorted based on the first time you see a given string. For example:

from itertools import count, filterfalse

i = count(1)
d = {}

with open('test.txt') as f:
    for line in filterfalse(lambda l: l.strip() in d, f):
        d[line.strip()] = next(i)
Mark
  • 90,562
  • 7
  • 108
  • 148
1

You can use dict.fromkeys() to build a dict and count() to fill values:

from itertools import count

lst = ["AAAA", "AAAA", "AAAA", "BBBB", "BBBB", "CCCC", "CCCC", "CCCC"]

dct = dict.fromkeys(lst)
c = count()

for key in dct:
    dct[key] = next(c)

print(dct)
# {'AAAA': 0, 'BBBB': 1, 'CCCC': 2}
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
0

Assuming keys of dictionary are :

keys = ['A', 'B', 'C']

Then:

id = range(len(keys))
d = dict(zip(keys, id))
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
RaphaëlR
  • 522
  • 7
  • 8
0

I would do it following way:

data = ['A','A','A','B','B','C','C','D','C']
unique = [i for inx,i in enumerate(data) if data.index(i)==inx]
print(unique) # ['A', 'B', 'C', 'D']
d = {(i,inx) for inx,i in enumerate(unique)}
print(d) # {('D', 3), ('A', 0), ('B', 1), ('C', 2)}

Idea behind this method might be described as: get value from list only if it occurs first time (same value did not appear earlier). I utilized fact that .index method of list, always returns lowest value possible. Note that in this method same values do not have to be neighbors.

Daweo
  • 31,313
  • 3
  • 12
  • 25
0

first you have to remove duplicates based on this answer: How do you remove duplicates from a list whilst preserving order?

so it will be like this:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

l = [
"AAAA", 
"AAAA", 
"AAAA", 
"BBBB",
"BBBB",
"CCCC",
"CCCC",
"CCCC"]

#first remove duplicates
s = f7(l)

#create desired dict
dict(zip(s,range(len(s))))
#{'AAAA': 0, 'CCCC': 1, 'BBBB': 2}