Iterating through a CSV to determine the type of data

Question

So I am doing a starters course in Python, and I need to make the following: There is a CSV file, where there are 10 columns, filled with 200 rows. Each has either a str, int, or float as value.

Example input:

id  gender  age marital location    income  intelliscore    emotiscore
51  F   46  M   0   15100   531 555
52  M   29  M   2   14200   673 633
53  M   25  S   0   22200   742 998
54  M   36  M   2   1000    677 646
55  F   99  S   0   10600   608 998

Now what i gotta do, is create another CSV file, and "replace" these values, with the types. So the desired result would be:

'string', 'string', 'string', 'string', 'string', 'string', 'string', 'string'
'int', 'string', 'int', 'string', 'int', 'int', 'int', 'int', 'int'
'int', 'string', 'int', 'string', 'int', 'int', 'int', 'int', 'int'
'int', 'string', 'int', 'string', 'int', 'int', 'int', 'int', 'int'
'int', 'string', 'int', 'string', 'int', 'int', 'int', 'int', 'int'

The code I am currently using is:

def csvfields2types(self, csvfile):
    csvtypes = []
    for line in csvfile:
        row = []
        for variable in line:
                if variable == str:
                    row.append('string')
                elif variable == float:
                    row.apend('float')
                elif variable == int:
                    row.append('int')
                else:
                    row.append('huh?')
        csvtypes.append(row)
    return csvtypes

It just returns a list with 'huh?'.

score 0 · Answer 1 · edited Dec 20 '19 at 22:13

0

you're checking if the value of the variable is string. You want to check if it's type is a string instead...

if type(variable) == str:

edited Dec 20 '19 at 22:13

Nazim Kerimbekov

4,712
8
34
58

answered Dec 20 '19 at 22:05

MaMaG

359
1
9

better yet, to simplify your code, you could do: `for variable in line: row.append(type(variable))` – MaMaG Dec 20 '19 at 22:06

finefoot · Answer 2 · 2019-12-20T22:26:47.390

Are you familiar with the EAFP principle in Python? If not, have a look at this question: What is the EAFP principle in Python?

We could do something similar here: You could use try to "test" the type, by just assuming the string represents a value of that type and convert it. If it works, we have found a matching type. If it fails, we try the next type. You will have to make sure to begin at the most restrictive type int (as all integers could be interpreted as floats ending in .0, too), then float and then str.

Putting that into a function could look something like this:

def check_type(input_string):
    try:
        int(input_string)
        return int
    except ValueError:
        pass
    try:
        float(input_string)
        return float
    except ValueError:
        pass
    return str

Some examples:

>>> check_type("10")
<class 'int'>
>>> check_type("10.1")
<class 'float'>
>>> check_type("A")
<class 'str'>

By the way, don't get confused by scientific notation which is also acceptable float input:

>>> check_type("1e1")
<class 'float'>

oppressionslayer · Answer 3 · 2019-12-20T23:54:59.693

IF you create a pandas dataframe from you object, you can do this:

import pandas as pd
df = pd.read_csv('out146.txt', delim_whitespace=True)
for col in df: 
   df[col] = df[col].apply(lambda x: f"""'{re.findall(r"'(.*?)'",str(type(x))).pop()}'""")

output:

      id gender    age marital location income intelliscore emotiscore
0  'int'  'str'  'int'   'str'    'int'  'int'        'int'      'int'
1  'int'  'str'  'int'   'str'    'int'  'int'        'int'      'int'
2  'int'  'str'  'int'   'str'    'int'  'int'        'int'      'int'
3  'int'  'str'  'int'   'str'    'int'  'int'        'int'      'int'
4  'int'  'str'  'int'   'str'    'int'  'int'        'int'      'int'

score 0 · Answer 4 · answered Dec 21 '19 at 00:57

Assuming that the input CSV file (csvfile.csv) is delimited only by one space character (" ") you could define two methods to determine if each element on each row is either an integer or float (if not it should be a string then) and use the csv Python module.

A working example that writes the desired outcome to a new output.csv file looks like this:

import csv

def isint(n):
    try:
        int(n)
        return True
    except:
        return False

def isfloat(n):
    try:
        float(n)
        return True
    except:
        return False

csvfile = list(csv.reader(open("csvfile.csv", "r"), delimiter=" "))
out = csv.writer(open("output.csv", "w"), delimiter=",")

for line in csvfile:
    row = []
    for variable in line:
        if isint(variable) == True:
            row.append('int')
        elif isfloat(variable) == True:
            row.append('float')
        else:
            row.append('str')
    out.writerow(row)

Iterating through a CSV to determine the type of data

4 Answers4