4

I have a two columned data set that I would like to reshape.
Looking at this fake df:

df=pd.DataFrame([
    ['Alex', 'Apple'],['Bob', 'Banana'],['Clark', 'Citrus'], ['Diana', 'Banana'], [
'Elisa', 'Apple'], ['Frida', 'Citrus'], ['George', 'Citrus'], ['Hanna', 'Banana']
],columns=['Name', 'Fruit'])

I would like to have four columns; Name, Apple, Banana and Citrus where the three latter are booleans (true/false).
I've looked inte unstack but it's really not what I am looking for.

Mactilda
  • 393
  • 6
  • 18

3 Answers3

5

I think this should be a good use case for get_dummies:

df.set_index('Name')['Fruit'].str.get_dummies().astype(bool).reset_index()

     Name  Apple  Banana  Citrus
0    Alex   True   False   False
1     Bob  False    True   False
2   Clark  False   False    True
3   Diana  False    True   False
4   Elisa   True   False   False
5   Frida  False   False    True
6  George  False   False    True
7   Hanna  False    True   False

In similar vein, we have,

pd.concat([df['Name'], df['Fruit'].str.get_dummies().astype(bool)], axis=1)

     Name  Apple  Banana  Citrus
0    Alex   True   False   False
1     Bob  False    True   False
2   Clark  False   False    True
3   Diana  False    True   False
4   Elisa   True   False   False
5   Frida  False   False    True
6  George  False   False    True
7   Hanna  False    True   False
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Great! Thank you - still new to python (I'm a r-girl). You don't happen to know how to create a matrix from the new df where True/false is 1/0? – Mactilda Mar 13 '19 at 17:17
  • @Mactilda Just remove the `astype(bool)` from my code everywhere. I assumed you wanted True/False since you mentioned booleans, but representing the result as 0/1s is more straightforward. – cs95 Mar 13 '19 at 17:18
  • Thanks! I know how to drop the first column is there anyway I can drop the column headers as well to make it a matrix? – Mactilda Mar 13 '19 at 17:31
  • @Mactilda Do you want an array or a DataFrame without column names? If the former, you can use anky_91's suggestion. Otherwise, do `df.columns = range(len(df.columns))` – cs95 Mar 13 '19 at 17:33
  • 1
    @Mactilda Alternatively, I have an answer [here](https://stackoverflow.com/a/54508052/4909087) that explains how to convert a DataFrame to a matrix. – cs95 Mar 13 '19 at 17:34
  • @coldspeed I have a related question that I can't seem to find an answer to. If I wanted to have an x were all the true values are and nothing where the false values are - is there an easy command for that? – Mactilda Mar 18 '19 at 08:59
4

You can use the below:

df[['Name']].join(pd.get_dummies(df.Fruit).astype(bool))

     Name  Apple  Banana  Citrus
0    Alex   True   False   False
1     Bob  False    True   False
2   Clark  False   False    True
3   Diana  False    True   False
4   Elisa   True   False   False
5   Frida  False   False    True
6  George  False   False    True
7   Hanna  False    True   False
anky
  • 74,114
  • 11
  • 41
  • 70
4

Seems like crosstab is fine

pd.crosstab(df.Name,df.Fruit).astype(bool).reset_index()
Out[90]: 
Fruit    Name  Apple  Banana  Citrus
0        Alex   True   False   False
1         Bob  False    True   False
2       Clark  False   False    True
3       Diana  False    True   False
4       Elisa   True   False   False
5       Frida  False   False    True
6      George  False   False    True
7       Hanna  False    True   False
BENY
  • 317,841
  • 20
  • 164
  • 234