-2

I have one dictionary named column_types with values as below.

column_types = {'A': 'pa.int32()',
                'B': 'pa.string()'
               }

I want to pass the dictionary to pyarrow read csv function as below

from pyarrow import csv
table = csv.read_csv(file_name,
                     convert_options=csv.ConvertOptions(column_types=column_types)
                     )

But it is giving an error because values in dictionary is a string. The below statement will work without any issues.

from pyarrow import csv
table = csv.read_csv(file_name, convert_options=csv.ConvertOptions(column_types = {
                  'A':pa.int32(),
                  'B':pa.string()
               }))

How can I change dictionary values to executable statements and pass it into the csv.ConvertOptions ?

Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53
diwakar g
  • 99
  • 1
  • 2
  • 11
  • 2
    Do not pass code in a string and execute it. Instead pass a function object and call it. – alani Jun 16 '20 at 13:19
  • https://stackoverflow.com/questions/701802/how-do-i-execute-a-string-containing-python-code-in-python This answer here may help – Divyang Vashi Jun 16 '20 at 13:21
  • To expand on @alaniwi's comment, to do so, remove the `()`.`column_types = {'A': pa.int32}`, etc.. – Axe319 Jun 16 '20 at 13:21
  • 1
    @DivyangVashi this is NOT a case where he wants to use the `exec` function. – Axe319 Jun 16 '20 at 13:23
  • @Axe319 I can't remove braces. Because int32 and string are function of pyarrow. – diwakar g Jun 16 '20 at 13:30
  • @diwakarg removing them turns them into a function object. The `()` on a function calls them and executes them and I'm assuming in this case returns an `int` and a `string`. if you want to hand it the type, you would hand it the object. – Axe319 Jun 16 '20 at 13:40
  • Take the builtin `str` for example. If you want to check if something is of type `str` you would use `isinstance(some_string, str)`. `isinstance(some_string, str())` would throw an error. – Axe319 Jun 16 '20 at 13:42
  • @Axe319 Please share sample code. Difficult to understand what are you saying – diwakar g Jun 16 '20 at 13:50
  • @Axe319 I am generating this column types programmatically. ```column_types['A'] = 'pa.'+datatype+'()'``` by iterating over for loop – diwakar g Jun 16 '20 at 13:53
  • `pa.int32` is a function. Putting `pa.int32()` in your code will call the function and return an int. In your example the class `ConvertOptions` is looking for the type of column. Not an int. So you would hand it `pa.int32`. I used the string example because it's easy to understand. `test = str()` assigns an empty string to `test`. `test = str` assigns the function `str` to `test`. – Axe319 Jun 16 '20 at 13:57
  • If you are doing that you can just as easily assign function objects like `column_types['A'] = datatype` in a loop. just have `datatype = pa.int32` and so on. There's no need to convert to a string and then back. – Axe319 Jun 16 '20 at 13:59

2 Answers2

0

There are two ways that worked for me you can use both of them however I would recommend the second one as the first one uses eval() and using it is risky in user input cases. If you are not using input string given by user you can use method 1 too.

1) USING eval()

import pyarrow as pa

column_types={}

column_types['A'] = 'pa.'+'string'+'()'
column_types['B'] = 'pa.'+'int32'+'()'

final_col_types={key:eval(val) for key,val in column_types.items()} # calling eval() to parse each string as a function and creating a new dict containing 'col':function()

from pyarrow import csv
table = csv.read_csv(filename,convert_options=csv.ConvertOptions(column_types=final_col_types))
print(table)

2) By creating a master dictionary dict_dtypes that contains the callable function name for a particular string. And further using dict_dtypes to map the string to its corresponding function.

import pyarrow as pa

column_types={}

column_types['A'] = 'pa.'+'string'+'()'
column_types['B'] = 'pa.'+'int32'+'()'

dict_dtypes={'pa.string()':pa.string(),'pa.int32()':pa.int32()} # master dict containing callable function for a string
final_col_types={key:dict_dtypes[val] for key,val in column_types.items() } # final column_types dictionary created after mapping master dict and the column_types dict

from pyarrow import csv
table = csv.read_csv(filename,convert_options=csv.ConvertOptions(column_types=final_col_types))
print(table)
Kaustubh Lohani
  • 635
  • 5
  • 15
  • The problem is I am generating this column types programmatically. Like below ```column_types['A'] = 'pa.'+datatype+'()'``` by iterating over for loop – diwakar g Jun 16 '20 at 13:44
  • Ok sorry for that. I have updated the answer. Check and tell me if that works for you. – Kaustubh Lohani Jun 16 '20 at 14:36
-1

Why don't we use something like this:

column_types = {'A': pa.int32(),
                'B': pa.string()}

table = csv.read_csv(file_name, 
                     convert_options=csv.ConvertOptions(column_types=column_types))
Déjà vu
  • 774
  • 2
  • 9
  • 31
  • The problem is I am generating this column types programmatically. Like below ```column_types['A'] = 'pa.'+datatype+'()'``` by iterating over for loop – diwakar g Jun 16 '20 at 13:41