-1

I want to create a word dictionary. The dictionary looks like

words_meanings= {
                "rekindle": "relight",
                "pesky":"annoying", 
                "verge": "border",
                "maneuver": "activity",
                "accountability":"responsibility",
                }

keys_letter=[]

for x in words_meanings:
  keys_letter.append(x)
print(keys_letter)

Output: rekindle , pesky, verge, maneuver, accountability

Here rekindle , pesky, verge, maneuver, accountability they are the keys and relight, annoying, border, activity, responsibility they are the values.

Now I want to create a csv file and my code will take input from the file.

The file looks like

rekindle | pesky   |  verge |  maneuver |  accountability
relight  | annoying|  border|  activity |  responsibility

So far I use this code to load the file and read data from it.

from google.colab import files
uploaded = files.upload()
import pandas as pd 
data = pd.read_csv("words.csv")
data.head()
import csv
reader = csv.DictReader(open("words.csv", 'r'))
words_meanings = []
for line in reader:
  words_meanings.append(line)
print(words_meanings)

This is the output of print(words_meanings)

[OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]

It looks very odd to me.

keys_letter=[]
for x in words_meanings:
  keys_letter.append(x)
print(keys_letter)

Now I create an empty list and want to append only key values. But the output is [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]

I am confused. As per the first code block it only included keys but now it includes both keys and their values. How can I overcome this situation?

Encipher
  • 1,370
  • 1
  • 14
  • 31
  • first you work with normal dictionary which has many elements, next you work with list which has only one element `OrderedDict` - and this can make difference. `Dictionary` and `list with dictionary inside` are two different objects. When you use `dictionary` with `for`-loop` then it gives `keys` . When you use `list with dictionary` with `for`-loop` then it gives first element from list which is `dictionary with keys and values` – furas Jan 21 '21 at 03:36

4 Answers4

1

I would suggest that you format your csv with your key and value on the same row. Like this

rekindle,relight
pesky,annoying
verge,border

This way the following code will work.

words_meanings = {}
with open(file_name, 'r') as file:
    for line in file.readlines():
        key, value = line.split(",")
        word_meanings[key] = value.rstrip("\n")

if you want a list of the keys: list_of_keys = list(word_meanings.keys())

To add keys and values to the file:

def add_values(key:str, value:str, file_name:str):
    with open(file_name, 'a') as file:
        file.writelines(f"\n{key},{value}")

key = input("Input the key you want to save: ")
value = input(f"Input the value you want to save to {key}:")
add_values(key, value, file_name)```
lwashington27
  • 320
  • 2
  • 14
  • If I formatted my csv row wise isn't it mean that they all are values? – Encipher Jan 21 '21 at 03:05
  • No, if you add the key then a comma then the value, the system will know that the first column are all keys and the second is all values. As long as there is one key and value per row. – lwashington27 Jan 21 '21 at 03:08
  • I just run the command update rows but getting error. Can you show how could I give the input in csv file? csv is noting but the excel file. As per my input A1= rekindle, B1= relight, A2= pesky, B2= annoying. But when I run any code over it all the words treated as values. – Encipher Jan 21 '21 at 03:18
  • If you're adding keys and values through Excel without saving the sheet as a csv, then the data won't be the same. Xlsx saves more information in a different format from a csv. – lwashington27 Jan 21 '21 at 15:04
  • Since OP sees fit to take your answer as their own, it's only fair that you get an upvote. – mhawke Jan 26 '21 at 01:19
0

You run the same block of code but you use it with different objects and this gives different results.


First you use normal dictionary (check type(words_meanings))

 words_meanings = {
            "rekindle": "relight",
            "pesky":"annoying", 
            "verge": "border",
            "maneuver": "activity",
            "accountability":"responsibility",
            }

and for-loop gives you keys from this dictionary

You could get the same with

 keys_letter = list(words_meanings.keys())

or even

 keys_letter = list(words_meanings)

Later you use list with single dictionary inside this list (check type(words_meanings))

 words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]

and for-loop gives you elements from this list, not keys from dictionary which is inside this list. So you move full dictionary from one list to another.

You could get the same with

 keys_letter = words_meanings.copy()

or even the same

 keys_letter = list(words_meanings)

from collections import OrderedDict
words_meanings = {
                "rekindle": "relight",
                "pesky":"annoying", 
                "verge": "border",
                "maneuver": "activity",
                "accountability":"responsibility",
                }

print(type(words_meanings))

keys_letter = []
for x in words_meanings:
  keys_letter.append(x)
print(keys_letter)

#keys_letter = list(words_meanings.keys())
keys_letter = list(words_meanings)
print(keys_letter)


words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]

print(type(words_meanings))

keys_letter = []
for x in words_meanings:
  keys_letter.append(x)
print(keys_letter)

#keys_letter = words_meanings.copy()
keys_letter = list(words_meanings)
print(keys_letter)
furas
  • 134,197
  • 12
  • 106
  • 148
0

The default field separator for the csv module is a comma. Your CSV file uses the pipe or bar symbol |, and the fields also seem to be fixed width. So, you need to specify | as the delimiter to use when creating the CSV reader.

Also, your CSV file is encoded as Big-endian UTF-16 Unicode text (UTF-16-BE). The file contains a byte-order-mark (BOM) but Python is not stripping it off, so you will notice the string '\ufeffrekindle' contains the FEFF UTF-16-BE BOM. That can be dealt with by specifying encoding='utf16' when you open the file.

import csv

with open('words.csv', newline='', encoding='utf-16') as f:
    reader = csv.DictReader(f, delimiter='|', skipinitialspace=True)
    for row in reader:
        print(row)

Running this on your CSV file produces this:

{'rekindle ': 'relight  ', 'pesky   ': 'annoying', 'verge ': 'border', 'maneuver ': 'activity ', 'accountability': 'responsibility'}

Notice that there is trailing whitespace in the key and values. skipinitialspace=True removed the leading whitespace, but there is no option to remove the trailing whitespace. That can be fixed by exporting the CSV file from Excel without specifying a field width. If that can't be done, then it can be fixed by preprocessing the file using a generator:

import csv

def preprocess_csv(f, delimiter=','):
    # assumes that fields can not contain embedded new lines
    for line in f:
        yield delimiter.join(field.strip() for field in line.split(delimiter))

with open('words.csv', newline='', encoding='utf-16') as f:
    reader = csv.DictReader(preprocess_csv(f, '|'), delimiter='|', skipinitialspace=True)
    for row in reader:
        print(row)

which now outputs the stripped keys and values:

{'rekindle': 'relight', 'pesky': 'annoying', 'verge': 'border', 'maneuver': 'activity', 'accountability': 'responsibility'}
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • The default separate of my CSV file is also "," . I used "|" just to make my question understandable. After taking your suggestion change my code: `import csv file_name="words.csv" words_meanings = {} with open(file_name, newline='', encoding='utf-16-le') as file: for line in file.readlines(): key, value = line.split(",") words_meanings[key] = value print(words_meanings)` I got this error "'utf-16-le' codec can't decode byte 0x0a in position 0: truncated data" How will I remove it? – Encipher Jan 21 '21 at 19:44
  • @Encipher: post an accurate representation of the problem. When you say "the file looks like" we take it to be that is what the file actually looks like, not some pretty printed version of it. And follow the guidance offered; why are you using the `utf-16-le` encoding when I suggested `utf-16`? The file, on the basis of the output in your post, is UTF16 BE encoded. However the error message in your comment above suggests that it is not because it begins with `0x0a`. – mhawke Jan 22 '21 at 00:38
  • I used `utf-16-le` instead of `utf-16` because `utf-16` threw error. I found `utf-16-le` can eliminate the error from other posts of stackoverflow, – Encipher Jan 22 '21 at 22:51
  • How are you creating the CSV file? Export from Excel as CSV? You're not trying to use a native Excel file? You can try to determine the character encoding using `chardet` like this: `import chardet; chardet.detect(open('word.csv','rb').read())`. Or check this out :https://stackoverflow.com/questions/3710374/get-encoding-of-a-file-in-windows – mhawke Jan 23 '21 at 01:54
0

As I found that no one able to help me with the answer. Finally, I post the answer here. Hope this will help other.

import csv
file_name="words.csv"
words_meanings = {}
with open(file_name, newline='', encoding='utf-8-sig') as file:
    for line in file.readlines():
        key, value = line.split(",")
        words_meanings[key] = value.rstrip("\n")
print(words_meanings)

This is the code to transfer a csv to a dictionary. Enjoy!!!

Encipher
  • 1,370
  • 1
  • 14
  • 31
  • The code that you've posted is essentially the code that @lwashington27 posted? To say that you were not helped is, at best, insincere. You were perhaps unable to understand the help given because your question was so misleading that you could not see how it related to your problem. In future please write your questions to accurately reflect your code, input and output so as to avoid wasting the time of others. – mhawke Jan 26 '21 at 01:16