0

I'm using pandas to tidy up some live sport data and rank players. When I do the team stand however, for some reason the second teams stats are all being converted into floats, even through in the datasource they are integers.

How can I convert these floats back to integers. Bearing in mind some rows actually should be floats (E.G. Accuracy which represents a percentage.)

Here's what I have so far:

import pandas as pd
teams_df = pd.read_excel("GAME_STATS.xlsm", skiprows=8, nrows=12, usecols=[0,1,2])
teams_df.to_csv("team_stats.csv") 

This writes to a CSV which results in:

+----------------------------------------------------------+
|    | Unnamed: 0 | VS.                       | Unnamed: 2 |
+----------------------------------------------------------+
| 0  | 342        | Disposals                 | 339.0      | 
| 1  | 232        | Kicks                     | 214.0      | 
| 2  | 110        | Handballs                 | 125.0      | 
| 3  | 23         | Hit Outs                  | 45.0       | 
| 4  | 27         | Frees                     | 25.0       | 
| 5  | S.A.       |                           |            | 
| 6  |            | VS                        |            | 
| 7  | 23         | Scoring Shots             | 18.0       |
| 8  | 0.5652     | Accuracy                  | 0.5        |
| 9  | 26.3       | Disposal Per Goal         | 37.7       |
| 10 | 14.9       | Disposal Per Scoring Shot | 18.8       |
| 11 | 44         | Inside 50s                | 43.0       |
+----------------------------------------------------------+

As you can see for the team on the left: Rows 1-5 as well as 8 & 12 should all be integers. For some reason Pandas returns them as floats. Is there a way I can convert these particular columns / rows back to ints or stop this from happening in the first place?

I've tried using .astype(int) function but only getting errors.

When printing as dict this is the output:

print(teams_df.to_dict())

{'Unnamed: 0': {0: 342, 1: 232, 2: 110, 3: 23, 4: 27, 5: 'S.A.', 6: nan, 7: 23, 8: 0.5652, 9: 26.3, 10: 14.9, 11: 44}, 'VS': {0: 'Disposals', 1: 'Kicks', 2: 'Handballs', 3: 'Hit Outs', 4: 'Frees', 5: nan, 6: 'VS', 7: 'Scoring Shots', 8: 'Accuracy', 9: 'Disposal Per Goal', 10: 'Disposal Per Scoring Shot', 11: 'Inside 50s'}, 'Unnamed: 2': {0: 339.0, 1: 214.0, 2: 125.0, 3: 45.0, 4: 25.0, 5: nan, 6: nan, 7: 18.0, 8: 0.5, 9: 37.7, 10: 18.8, 11: 43.0}}
RickPeck
  • 165
  • 2
  • 14
  • Posting your dataframe as a code-friendly snippet will make this and future questions easier to answer. Instead of `team_df.to_csv(...)` use `print(team_df.to_dict())` and include the output in your question. – Henry Ecker Apr 27 '21 at 01:11
  • @HenryEcker I've made an edit accordingly, thanks. – RickPeck Apr 27 '21 at 01:35
  • Turn the number into strings. This is probably the easiest way to deal with mixed types in Pandas just for printing purpose. – Fanchen Bao Apr 25 '23 at 22:55

1 Answers1

1

Pandas columns hold homogeneous types. You can cast it to the object type to store python objects (and hence mixed types). This post may be helpful.

Tytrox
  • 483
  • 2
  • 10