1

I have a string the exact same as below, My goal is so split it into a dataframe but I am finding trouble getting it to work. I have tried search on stack but have got nowhere.

'Position             Players   Average Form\nGoalkeeper        Manuel Neuer  4.17017132535\n  Defender         Diego Godin  4.14973163459\n  Defender   Giorgio Chiellini  4.10115207373\n  Defender        Thiago Silva  3.93318274318\n  Defender     Andrea Barzagli  3.85132973289\nMidfielder        Arjen Robben  4.80556193806\nMidfielder     Alexander Meier  4.51037598508\nMidfielder       Franck Ribery  4.48063714064\nMidfielder         David Silva  3.76028050109\n   Forward   Cristiano Ronaldo  7.87909462636\n   Forward  Zlatan Ibrahimovic  6.85401665065'

Is there a way to turn this into a dataframe, in a reproducible way so I could do it with other strings?

My goal dataframe would look like as follows:

Position    name                Average
Goalkeeper  Manuel              4.17017132535
Defender    Diego               4.14973163459
Defender    Giorgio             4.10115207373
Defender    Thiago              3.93318274318
Defender    Andrea              3.85132973289
Midfielder  Arjen               4.80556193806
Midfielder  Alexander           4.51037598508
Midfielder  Franck              4.48063714064
Midfielder  David               3.76028050109
Forward     Cristiano           7.87909462636
Forward     Hnery               6.85401665065

I am new to pandas so any help would be greatly appreciated

  • Possible duplicate of [How to create a Pandas DataFrame from a string](https://stackoverflow.com/questions/22604564/how-to-create-a-pandas-dataframe-from-a-string) – Georgy Apr 18 '18 at 08:54
  • create a dictionary variable y, load your string into it, clean your data for whitespaces and convert it into dataframe: y = {} y = your_string z = pd.DataFrame([y.split("\n")]) z.head() this should give you your dataframe. Please refurbish the code as per your requirements. – Shrinivas Deshmukh Apr 18 '18 at 14:22
  • If one of the below solutions solved your problem, please consider accepting (green tick on left), or feel free to ask for clarification. – jpp Apr 18 '18 at 17:22

2 Answers2

1

This is one way.

import pandas as pd

mystr = 'Position             Players   Average Form\nGoalkeeper        Manuel Neuer  4.17017132535\n  Defender         Diego Godin  4.14973163459\n  Defender   Giorgio Chiellini  4.10115207373\n  Defender        Thiago Silva  3.93318274318\n  Defender     Andrea Barzagli  3.85132973289\nMidfielder        Arjen Robben  4.80556193806\nMidfielder     Alexander Meier  4.51037598508\nMidfielder       Franck Ribery  4.48063714064\nMidfielder         David Silva  3.76028050109\n   Forward   Cristiano Ronaldo  7.87909462636\n   Forward  Zlatan Ibrahimovic  6.85401665065'

lst = mystr.split()
data = [lst[pos:pos+4] for pos in range(0, len(lst), 4)]

df = pd.DataFrame(data[1:], columns=data[0])

print(df)

#       Position    Players      Average           Form
# 0   Goalkeeper     Manuel        Neuer  4.17017132535
# 1     Defender      Diego        Godin  4.14973163459
# 2     Defender    Giorgio    Chiellini  4.10115207373
# 3     Defender     Thiago        Silva  3.93318274318
# 4     Defender     Andrea     Barzagli  3.85132973289
# 5   Midfielder      Arjen       Robben  4.80556193806
# 6   Midfielder  Alexander        Meier  4.51037598508
# 7   Midfielder     Franck       Ribery  4.48063714064
# 8   Midfielder      David        Silva  3.76028050109
# 9      Forward  Cristiano      Ronaldo  7.87909462636
# 10     Forward     Zlatan  Ibrahimovic  6.85401665065

This method will not be perfect in these instances:

  1. Whitespace in column names, as above. In this case, you will need to redefine column names.
  2. Whitespace in player names. This does not appear to be a problem with the data provided.
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Here is how you will work your way around that.

import pandas as pd
from io import StringIO
data  = StringIO('Position             Players   Average Form\nGoalkeeper        Manuel Neuer  4.17017132535\n  Defender         Diego Godin  4.14973163459\n  Defender   Giorgio Chiellini  4.10115207373\n  Defender        Thiago Silva  3.93318274318\n  Defender     Andrea Barzagli  3.85132973289\nMidfielder        Arjen Robben  4.80556193806\nMidfielder     Alexander Meier  4.51037598508\nMidfielder       Franck Ribery  4.48063714064\nMidfielder         David Silva  3.76028050109\n   Forward   Cristiano Ronaldo  7.87909462636\n   Forward  Zlatan Ibrahimovic  6.85401665065')
df = pd.read_csv(data, sep="\n")
print(df)

Output :

      Position             Players   Average Form
0    Goalkeeper        Manuel Neuer  4.17017132535
1      Defender         Diego Godin  4.14973163459
2      Defender   Giorgio Chiellini  4.10115207373
3      Defender        Thiago Silva  3.93318274318
4      Defender     Andrea Barzagli  3.85132973289
5    Midfielder        Arjen Robben  4.80556193806
6    Midfielder     Alexander Meier  4.51037598508
7    Midfielder       Franck Ribery  4.48063714064
8    Midfielder         David Silva  3.76028050109
9       Forward   Cristiano Ronaldo  7.87909462636
10     Forward  Zlatan Ibrahimovic  6.85401665065
iDrwish
  • 3,085
  • 1
  • 15
  • 24
toheedNiaz
  • 1,435
  • 1
  • 10
  • 14
  • It's not the right solution. Check df without printing it. Even if you print the string directly you will get the same output. – Space Impact Apr 18 '18 at 09:06
  • yes i know that it prints in the same way because of "\n" but the data is parsed in data frame . and what makes you say this ? it is not the right solution ? can you point out ? the right solution ? try df.tail(2) to see last 2 elements or any other df operation of your choice . – toheedNiaz Apr 18 '18 at 09:11