1

After converting my csv to dictionary with pandas, a sample of the dictionary will look like this:

[{'Name': '1234', 'Age': 20},
 {'Name': 'Alice', 'Age': 30.1},
 {'Name': '5678', 'Age': 41.0},
 {'Name': 'Bob 1', 'Age': 14},
 {'Name': '!@#$%', 'Age': 65}]

My goal is to do a validation check if the columns are in string. I'm trying to use pandera or schema libs to achieve it as the csv may contain a million rows. Therefore, I am trying to convert the dict to as follows.

[{'Name': 1234, 'Age': 20},
 {'Name': 'Alice', 'Age': 30.1},
 {'Name': 5678, 'Age': 41.0},
 {'Name': 'Bob 1', 'Age': 14},
 {'Name': '!@#$%', 'Age': 65}]

After converting the csv data to dict, I use the following code to check if Name is string.

import pandas as pd
from schema import Schema, And, Use, Optional, SchemaError

schema = Schema([{'Name': str,
                  'Age':  float}])

validated = schema.validate(dict)

Is it possible?

petezurich
  • 9,280
  • 9
  • 43
  • 57

1 Answers1

1

Is it possible?

For sure. You can use the int constructor to convert that strings to integers if possible.

for element in list_:
    try:
        element["Name"] = int(element["Name"])
    except ValueError:
        pass

A faster way for doing it would be using isdigit method of class str.

for element in list_:
    if element["Name"].isdigit(): # Otherwise no need to convert
        element["Name"] = int(element["Name"])

So that you don't have to enter that try/except block.

FLAK-ZOSO
  • 3,873
  • 4
  • 8
  • 28