How to create dictionary from multiple list of string?

Question

I want to create a dictionary from list of strings. For example I have these list

AAAA
AAAA
AAAA
BBBB
BBBB
CCCC
CCCC
CCCC
....

Then I want to create a dictionary with numbering value from that, how to do that?

I explored some code but still have no idea

import os
path = "directoryA"
dirList = os.listdir(path)


with open("check.txt", "w") as a:
    for path, subdirs, files in os.walk(path):
        for filename in files:
            # I have splitted the text and now I want to create dictionary 
            #from it

            mylist = filename.split("_") # the text format is AAAA_0 and I split 
                                         #it so I can have list of 'AAAA' and '0'

            k = mylist[0] #I only take 'AAAA' string after splitting
            print(k) # here the output only give text output. From this I want to 
                     # put into dictionary

This is the output after print(k) and these are not list

AAAA
AAAA
AAAA
BBBB
BBBB
CCCC
CCCC
CCCC
....

This is my expected result

myDic ={
    'AAAA': 0,
    'BBBB': 1,
    'CCCC': 2,
    'DDDD': 3,
    # ... and so on
}

Devesh Kumar Singh · Accepted Answer · 2019-05-12T10:50:03.203

2

Assuming the contents of check.txt looks like li, start by getting all unique elements in your list of strings by using a set, and then sort the unique list alphabetically

After that, use dictionary comprehension and enumerate to generate your dictionary

li = [
    "AAAA",
    "AAAA",
    "AAAA",
    "BBBB",
    "BBBB",
    "CCCC",
    "CCCC",
    "CCCC"]

#Get the list of unique strings by converting to a set
li = (list(set(li)))

#Sort the list lexicographically
li = sorted(li)

#Create your dictionary via dictionary comprehension and enumerate
dct =  {item:idx for idx, item in enumerate(li)}
print(dct)

The output will be

{'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

We should be able to create the list of strings li like so

import os
path = "directoryA"
dirList = os.listdir(path)
li = []

with open("check.txt", "w") as a:
    for path, subdirs, files in os.walk(path):
        for filename in files:
            # I have splitted the text and now I want to create dictionary 
            #from it

            mylist = filename.split("_") # the text format is AAAA_0 and I split 
                                         #it so I can have list of 'AAAA' and '0'

            k = mylist[0] 
            #append item to li
            li.append(k)

edited May 12 '19 at 10:50

answered May 12 '19 at 07:26

Devesh Kumar Singh

20,259
5
21
40

I tried this but the result like this {'AAAA': 0}, {'AAAA': 0}, {'BBBB': 0}, {'CCCC': 0}. What I want is to put the same string into same number, and increament the number if the string is different than the previous string – Bob Adi Setiawan May 12 '19 at 07:37
Is your input string the way I have defined it? What input string are you taking, is it different that what I took from your example @BobAdiSetiawan ? – Devesh Kumar Singh May 12 '19 at 07:38
I don't think so, because after I check the output, it gave multiple text output, for example AAAA, AAAA, BBBB, BBBB, CCCC, CCCC, etc. From these output, I want to create dictionary – Bob Adi Setiawan May 12 '19 at 07:45
Is it a list of strings? Check my updated answer! @BobAdiSetiawan ! – Devesh Kumar Singh May 12 '19 at 07:55
The output are still not a list of strings. It just output a text after I did the splitting part – Bob Adi Setiawan May 12 '19 at 07:57
Then what is the input? A string? A list of strings? I provided both examples to you! Can you please add more clarification in the question! Also as I said! The code I provided is independent of what you are trying to do overall outside this code! You can take this and implement it whichever way you like @BobAdiSetiawan :) – Devesh Kumar Singh May 12 '19 at 07:59
The input is a text from splitting text. For example AAAA_0, AAAA_1, BBBB_0, CCCC_0; then after splitting and I print out the text, it will have this output text AAAA, AAAA, BBBB, CCCC. From this text, I want to put into dictionary – Bob Adi Setiawan May 12 '19 at 08:02
Okay then append all this output `AAAA`, `BBBB` etc into a big list (so `li = ["AAAA", "BBBB"..]` ),, which will look like what I did in my example, and from there the solution I provided would work @BobAdiSetiawan – Devesh Kumar Singh May 12 '19 at 08:04
I tried to append the output to list but have result in [AAAA] [AAAA, BBBB] [AAAA, BBBB, CCCC] which has different line. So how to take only [AAAA, BBBB, CCCC] list? – Bob Adi Setiawan May 12 '19 at 09:09
Can you add whatever code you are trying to the question so that I can see what is going on @BobAdiSetiawan – Devesh Kumar Singh May 12 '19 at 09:28
I tried to append also but i found that it will be repeated with different line – Bob Adi Setiawan May 12 '19 at 09:58
Just make a global list `li`, and append `k` to that list, then process according to what I wrote in the question @BobAdiSetiawan – Devesh Kumar Singh May 12 '19 at 10:50
Great, Glad to help! If the answer helped you, please consider marking my answer as accepted if it helped you :) @BobAdiSetiawan Also consider taking a look at https://stackoverflow.com/help/someone-answers – Devesh Kumar Singh May 12 '19 at 12:17
Thanks for the accept @BobAdiSetiawan Have a great day :) – Devesh Kumar Singh May 12 '19 at 12:58

Mark · Answer 2 · 2019-05-12T08:05:52.113

1

You can use itertools.groupby to group the strings assuming they are sorted as you have them (it not, sort them first). Then enumerate() over the groups which will give you the count:

from itertools import groupby
l = [
    "AAAA", 
    "AAAA", 
    "AAAA", 
    "BBBB",
    "BBBB",
    "CCCC",
    "CCCC",
    "CCCC"]

d = {key:i for i, (key, group) in enumerate(groupby(l))}
# {'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

If you are reading from a file and the strings are not sorted, you can add an entry and increment each time you find something not yet in the dict. The values will be sorted based on the first time you see a given string. For example:

from itertools import count, filterfalse

i = count(1)
d = {}

with open('test.txt') as f:
    for line in filterfalse(lambda l: l.strip() in d, f):
        d[line.strip()] = next(i)

edited May 12 '19 at 08:05

answered May 12 '19 at 07:42

Mark

90,562
7
108
148

Then I need to define the l. But what if I want to do automatically because there are hundreds of the list. But anyway thank you – Bob Adi Setiawan May 12 '19 at 07:46
@BobAdiSetiawan you don't need to define `l` you can pass any iterable to `groupby` – Mark May 12 '19 at 07:47
I am always amazed how powerful can itertools be ! – Devesh Kumar Singh May 12 '19 at 07:49
Ok, I will try this method. Thank you – Bob Adi Setiawan May 12 '19 at 07:53

score 1 · Answer 3 · edited May 12 '19 at 08:25

1

You can use dict.fromkeys() to build a dict and count() to fill values:

from itertools import count

lst = ["AAAA", "AAAA", "AAAA", "BBBB", "BBBB", "CCCC", "CCCC", "CCCC"]

dct = dict.fromkeys(lst)
c = count()

for key in dct:
    dct[key] = next(c)

print(dct)
# {'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

edited May 12 '19 at 08:25

Devesh Kumar Singh

20,259
5
21
40

answered May 12 '19 at 08:08

Mykola Zotko

15,583
3
71
73

score 0 · Answer 4 · edited May 13 '19 at 10:35

0

Assuming keys of dictionary are :

keys = ['A', 'B', 'C']

Then:

id = range(len(keys))
d = dict(zip(keys, id))

edited May 13 '19 at 10:35

marc_s

732,580
175
1,330
1,459

answered May 12 '19 at 07:42

RaphaëlR

522
7
8

score 0 · Answer 5 · answered May 12 '19 at 07:56

I would do it following way:

data = ['A','A','A','B','B','C','C','D','C']
unique = [i for inx,i in enumerate(data) if data.index(i)==inx]
print(unique) # ['A', 'B', 'C', 'D']
d = {(i,inx) for inx,i in enumerate(unique)}
print(d) # {('D', 3), ('A', 0), ('B', 1), ('C', 2)}

Idea behind this method might be described as: get value from list only if it occurs first time (same value did not appear earlier). I utilized fact that .index method of list, always returns lowest value possible. Note that in this method same values do not have to be neighbors.

score 0 · Answer 6 · answered May 12 '19 at 08:33

first you have to remove duplicates based on this answer: How do you remove duplicates from a list whilst preserving order?

so it will be like this:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

l = [
"AAAA", 
"AAAA", 
"AAAA", 
"BBBB",
"BBBB",
"CCCC",
"CCCC",
"CCCC"]

#first remove duplicates
s = f7(l)

#create desired dict
dict(zip(s,range(len(s))))
#{'AAAA': 0, 'CCCC': 1, 'BBBB': 2}

How to create dictionary from multiple list of string?

6 Answers6