1

Let me explain a bit better my problem.

I have a .csv that contains columns with 'False' or 'True' and others with 'true' and 'false' (note the presence of capital letters or not).

I'm ok with pandas read_csv function converting the first ones directly to Python boolean True and False.

However, I want to keep 'true' and 'false' as strings but they are converted to Python boolean True and False.

Is there a way to keep them as strings ?


I know about true_values optional keyword to add values to consider as True but it adds values, and not replaces values to consider True.

If come accross this other StackOverlow post but can't apply it to my problem as I want some columns with True and some with 'true' and the only way to know which columns needs to be a string or not is to read the data (but it is converted thus this post).

Thanks in advance for any help,

VictorGalisson
  • 635
  • 9
  • 27
  • So you want to have a column with boolean and strings? – Dani Mesejo Nov 27 '19 at 10:30
  • 1
    Have you tried to define the `dtype` of these columns to `str`? – Aryerez Nov 27 '19 at 10:33
  • @DanielMesejo no, each column has a separated type, when there is 'True' or 'False', I know there is only correct boolean values. However, when there is 'true' or 'false', it's only string. – VictorGalisson Nov 27 '19 at 10:40
  • @Aryerez the issue is that there is quite a lot of columns and the only way to know which columns needs to be a string or not is to read the data. And I didn't find a way to programmatically define the dtype of those specific columns. – VictorGalisson Nov 27 '19 at 10:42
  • 1
    @VictorGalisson So define them all as strings, and then read them in python. – Aryerez Nov 27 '19 at 10:45
  • @Aryerez getting them all as strings is not what I really want but this seems to be the most convenient solution. – VictorGalisson Nov 27 '19 at 11:03
  • 1
    if you plan on doing the type inference by yourself, maybe [this](https://stackoverflow.com/questions/2859674/converting-python-list-of-strings-to-their-type) is helpful – FObersteiner Nov 27 '19 at 11:21

1 Answers1

1

illustrating Aryerez's comment, you can specify the dtype. given a csv 'test.csv' containing

a,b
True,true
False,false

you could call

import pandas as pd
df = pd.read_csv('test.csv', dtype={'a': bool, 'b': str})

giving you

df
Out[5]: 
       a      b
0   True   true
1  False  false

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
a    2 non-null bool
b    2 non-null object
dtypes: bool(1), object(1)
memory usage: 146.0+ bytes

What you could also do is mapping the column back to str, e.g. like in the linked post

df['b'] = df['b'].map({True: 'true', False: 'false'})

However, in both cases, you require information about the data - either the name of the specific column or the columns' dtype.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thanks for the answer, but unfortunately, it means I need to find which columns need to be a string first, my issue is that to do so, I need to read the data first (which is converted). – VictorGalisson Nov 27 '19 at 10:45