0

The original .csv file -

#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,FALSE
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,FALSE
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,FALSE

My Python code using df.iterrows() -

import pandas as pd
import os

df = pd.read_csv('pokemon_data.csv')
with open('output.txt', 'w') as f:
    for index, row in df.iterrows():
        row_i = str(index) + str(row)
        f.write(row_i)

I learned that we should avoid using df.iterrow(), for it would get very slow when dealing with big data.

How can I pivot the columns of a Pandas DataFrame into the inner-most level index, and get the result as follows, without using df.iterrows(), then?

0 #                     1
Name          Bulbasaur
Type 1            Grass
Type 2           Poison
HP                   45
Attack               49
Defense              49
Sp. Atk              65
Sp. Def              65
Speed                45
Generation            1
Legendary         False

1 #                   2
Name          Ivysaur
Type 1          Grass
Type 2         Poison
HP                 60
Attack             62
Defense            63
Sp. Atk            80
Sp. Def            80
Speed              60
Generation          1
Legendary       False

2 #                    3
Name          Venusaur
Type 1           Grass
Type 2          Poison
HP                  80
Attack              82
Defense             83
Sp. Atk            100
Sp. Def            100
Speed               80
Generation           1
Legendary        False
Sherman Chen
  • 174
  • 9

3 Answers3

3

With str() you can get the string representation of each row, then concatenate them together with .str.cat:

>>> print(df.agg(str, axis='columns').str.cat(sep='\n\n'))
#                     1
Name          Bulbasaur
Type 1            Grass
Type 2           Poison
HP                   45
Attack               49
Defense              49
Sp. Atk              65
Sp. Def              65
Speed                45
Generation            1
Legendary         False
Name: 0, dtype: object

#                   2
Name          Ivysaur
Type 1          Grass
Type 2         Poison
HP                 60
Attack             62
Defense            63
Sp. Atk            80
Sp. Def            80
Speed              60
Generation          1
Legendary       False
Name: 1, dtype: object

#                    3
Name          Venusaur
Type 1           Grass
Type 2          Poison
HP                  80
Attack              82
Defense             83
Sp. Atk            100
Sp. Def            100
Speed               80
Generation           1
Legendary        False
Name: 2, dtype: object

If you want to keep the index number you can use reset_index() and then tweak the string representation

>>> print(df.reset_index().agg(str, axis='columns').str.replace(r'^index\s*', '', regex=True).str.cat(sep='\n\n'))
0
#                     1
Name          Bulbasaur
Type 1            Grass
Type 2           Poison
HP                   45
Attack               49
Defense              49
Sp. Atk              65
Sp. Def              65
Speed                45
Generation            1
Legendary         False
Cimbali
  • 11,012
  • 1
  • 39
  • 68
  • [Cimbali](https://stackoverflow.com/users/1387346/cimbali), I just find that if I use 'df.agg', 'str.cat', and if a cell contains a long sentence, then the long sentence gets truncated. – Sherman Chen Aug 12 '21 at 01:56
  • 1
    It's the same mechanism as with @HenryEcker's answer I suppose. Your new question should solve that. – Cimbali Aug 12 '21 at 07:11
2

We can try stack + to_string:

df.stack().to_string('output.txt')

output.txt:

0  #                     1
   Name          Bulbasaur
   Type 1            Grass
   Type 2           Poison
   HP                   45
   Attack               49
   Defense              49
   Sp. Atk              65
   Sp. Def              65
   Speed                45
   Generation            1
   Legendary         False
1  #                     2
   Name            Ivysaur
   Type 1            Grass
   Type 2           Poison
   HP                   60
   Attack               62
   Defense              63
   Sp. Atk              80
   Sp. Def              80
   Speed                60
   Generation            1
   Legendary         False
2  #                     3
   Name           Venusaur
   Type 1            Grass
   Type 2           Poison
   HP                   80
   Attack               82
   Defense              83
   Sp. Atk             100
   Sp. Def             100
   Speed                80
   Generation            1
   Legendary         False
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • [@Henry Ecker](https://stackoverflow.com/users/15497888/henry-ecker), ```stack()```, ```to_string('output.txt')``` - concise, powerful, and beautiful! – Sherman Chen Aug 09 '21 at 01:48
  • [Henry Ecker](https://stackoverflow.com/users/15497888/henry-ecker), I just find that if I use 'to_string', and if a cell contains a long sentence, then the long sentence gets truncated. – Sherman Chen Aug 12 '21 at 01:53
  • 1
    It seems unlikely that `to_string` would behave that way. The `max_colwidth` is by default unlimited. I also just tested it with a column of 10,000-character-long strings and it printed out correctly. You might consider double checking your source data, or asking a [new question](https://stackoverflow.com/questions/ask) with a [MRE](https://stackoverflow.com/help/minimal-reproducible-example) which can be used to reproduce this new truncation issue. – Henry Ecker Aug 12 '21 at 02:06
  • Henry Ecker, certainly. I have asked [a new question](https://stackoverflow.com/questions/68751179/long-text-cells-got-truncated-when-using-df-stack-to-stringo-txt) – Sherman Chen Aug 12 '21 at 03:24
1

You could use df.apply(axis=1):

import pandas as pd
import os

df = pd.read_csv('pokemon_data.csv')
with open('output.txt', 'w') as f:
    def write_pokemon(pokemon):
        f.write('\n\n')
        f.write(pokemon.to_string())

    df.apply(write_pokemon, axis=1)
inu
  • 41
  • 4
  • [maanas](https://stackoverflow.com/users/8109114/maanas), I just find that if I use 'df.apply(axis=1)', and if a cell contains a long sentence, then the long sentence gets truncated. – Sherman Chen Aug 12 '21 at 01:59