0

i m trying to extract a specific column from a arabic file to another file this is my code

# coding=utf-8
import csv
from os import open

file = open('jamid.csv', 'r', encoding='utf-8')
test = csv.reader(file)
f = open('col.txt','w+', 'wb' ,encoding='utf-8')
for row in test:

    if len(row[0].split("\t"))>3 :
         f.write((row[0].split("\t"))[3].encode("utf-8"))

f.close()

and the file is like this :

4   جَوَارِيفُ  جواريف  جرف     اسم 
18  حَرْقى  حرقى    حرق     اسم
24  غَزَواتٌ    غزوات   غزو     اِسْمٌ

i keep gitting the same error :

File "col.py", line 5, in <module>  file = open('jamid.csv', 'r', encoding='utf-8')
TypeError: an integer is required (got type str)
Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
Maryam-O
  • 61
  • 1
  • 6
  • It looks like you want the standard `open` function, not the one in `os.open`. Its 3rd paramter is the file descriptor of the parent directory. – tdelaney Jun 05 '18 at 05:27
  • i have the same error using the standard open function – Maryam-O Jun 05 '18 at 05:36
  • drop import of open and an 3etra parameter ('wb') – Evgeny Jun 05 '18 at 05:38
  • i ve added the 3d parameter for this error TypeError: write() argument must be str, not bytes – Maryam-O Jun 05 '18 at 05:41
  • Please take the habit of extracting a [mcve]. In this particular case, large parts of your code are not necessary, like e.g. the final `f.close()` and they should be eliminated before posting here. – Ulrich Eckhardt Jun 05 '18 at 05:57
  • thanks for being helpfull and kind to me as a beginner @Ulrich Eckhardt – Maryam-O Jun 05 '18 at 06:00
  • @UlrichEckhardt - `f.close()` is needed in this case. There are some unnecessary lines but that is the bug - a misunderstanding about csv reader results in extra work. I don't see how this example could be any smaller. – tdelaney Jun 05 '18 at 06:02
  • I'm under the impression that everything that happens after the error line is irrelevant for a question here, because it isn't even executed, @tdelaney. Of course, for the real program they may be relevant but the reason to *demand* a MCVE is to put focus on the problem at hand. – Ulrich Eckhardt Jun 05 '18 at 06:06
  • @UlrichEckhardt - I pasted the program and had a running copy in just a few moments. Technically, there were several problems in the code so I guess to you this should have been several questions. But it was all from trying to fix a basic problem. I don't see a problem with how this was posted. – tdelaney Jun 05 '18 at 06:08
  • Is this python 2 or 3? – tdelaney Jun 05 '18 at 06:09
  • this is python3 – Maryam-O Jun 05 '18 at 06:11
  • The difference is like between giving people fish and teaching them how to fish. Isolating a problem and running it in a debugger are key skills every programmer needs to learn in order to be efficient, that's why I mentioned that the very approach to problem solving needs some work. – Ulrich Eckhardt Jun 05 '18 at 06:18

3 Answers3

1

I see a couple of problems with your code. First, you are using the signature of the open function with os.open, but it has different paramters. You can stick with open. More importantly, you seem to be trying to fix the row coming out of csv.reader by splitting it again on tabs.

My guess is that you saw the entire line in row[0] so tried to fix it. But the problem is that that the reader splits on commas by default - you need to supply a different delimiter. Here its a bit problematic because your code splits with a tab but the example shows spaces. I used spaces in my solution, but you can switch that as needed.

Finally, you attempted to encode the strings before giving them to the output file object. That object should be opened with the right encoding and you should simply give it strings.

# coding=utf-8
import csv

with open('jamid.csv', 'r', newline='', encoding='utf-8') as in_fp:
    with open('col.txt','w', newline='', encoding='utf-8') as out_fp:
        csv.writer(out_fp).writerows(row[3] for row in
            csv.reader(in_fp, delimiter=' ', skipinitialspace=True)
            if len(row) >= 3)
tdelaney
  • 73,364
  • 6
  • 83
  • 116
0

You can try using Pandas. I am posting the sample code.

import pandas as pd
df = pd.read_csv("Book1.csv")
# print(df.head(10))
my_col = df['اسم'] #Insert the column name you want to select.
print(my_col)

Ouput : enter image description here Note: I hope it takes Arabic encoding.


import pandas as pd 
df = pd.read_csv("filename.csv",encoding='utf-8') 
saved_column = df['اسم'] #change it to str type
# f= open("col.txt","w+",encoding='utf-8') 
with open("col3.txt","w+",encoding='utf-8') as f:
    f.write(saved_column) 
Hayat
  • 1,539
  • 4
  • 18
  • 32
  • i triyed pandas but it doesn't work, mybe it doesn't take arabic or at least my comande line doesn't take arabic caracteres – Maryam-O Jun 05 '18 at 05:46
  • I tried in mine. It's working completely fine. However, if you could have posted error I may be some help. – Hayat Jun 05 '18 at 06:06
  • 1
    i was using this code the same thing as you # coding=utf-8 import pandas as pd df = pd.read_csv("jamid.csv",sep="\r") saved_column = df["جذر"] f= open("col.txt","w+",encoding='utf-8') f.write(saved_column) f.close() – Maryam-O Jun 05 '18 at 06:14
0

You can try to using unicodecsv

How to write UTF-8 in a CSV file

# coding=utf-8
import csv
import unicodecsv as csv

file = open('jamid.csv', 'rb')
test = csv.reader(file, delimiter='\t')
f = open('col.txt', 'wb')
for row in test:
    if len(row)>3 :
         f.write(row[3].encode('utf8'))

f.close()
Arun Kumar Nagarajan
  • 2,347
  • 3
  • 17
  • 28
  • one more thing is that the output is attached i triyed to add f.write('\n') in the if close but is seams that it s wrong thing to do – Maryam-O Jun 05 '18 at 06:07
  • 1
    you can also learn how to use the with statement during file opening when working with file streams https://stackoverflow.com/questions/1369526/what-is-the-python-keyword-with-used-for – Arun Kumar Nagarajan Jun 05 '18 at 06:07
  • 2
    This has the same problem as the first: Since the reader defaults to comma delimiter, each row is read as one column and the hack is to split that with tabs. The real solution is to change the delimiter. – tdelaney Jun 05 '18 at 06:17
  • Yes, you are right. I've updated my answer. Sorry for posting without giving it a second thought. – Arun Kumar Nagarajan Jun 05 '18 at 06:27