3

The example would be a list called 'main' looks like

main =  [('date', '2020-04-21'),  ('oldname', 'Tap'),  ('newname', 'Tapnew'),  ('icon_url',   '3'),  ('date', '2020-04-21'),  ('oldname', 'Nod'),  ('newname', 'Nodnew'),  ('icon_url','4'),  ('date', '2020-04-21'),  ('oldname', 'Mik'),  ('newname', 'Miknew'),  ('icon_url','5')]

I've tried to directly parse and transform using this.

df = pd.DataFrame(main)
test = df.T
test.columns = test.iloc[0]
a = test.drop(test.index[0])

However the result dataframe is still a long sparse form with repeated columns

 date      oldname     newname    icon_url     date      oldname     newname    icon_url    date      oldname     newname    icon_url 
2020-04-21    Tap      Tapnew        3       2020-04-21      Nod     Nodnew       4      2020-04-21       Mik     Miknew      5  

The desired output would be

 date      oldname     newname    icon_url     
2020-04-21    Tap     Tapnew        3     
2020-04-21    Nod     Nodnew        4      
2020-04-21    Mik     Miknew        5  

I've been struggle the whole day --- Can anyone shed some lights on this? Thanks in advance.

ALollz
  • 57,915
  • 7
  • 66
  • 89
martinv
  • 31
  • 3

6 Answers6

3

Convert the list of tuple into dictionary.

In [62]: def tuple_to_dict(some_list):
    ...:     result = {}
    ...:     for k, v in some_list:
    ...:         result.setdefault(k, []).append(v)
    ...:
    ...:     return result
    ...:

In [63]: tuple_to_dict(main)
Out[63]:
{'date': ['2020-04-21', '2020-04-21', '2020-04-21'],
 'oldname': ['Tap', 'Nod', 'Mik'],
 'newname': ['Tapnew', 'Nodnew', 'Miknew'],
 'icon_url': ['3', '4', '5']}

In [64]: df = pd.DataFrame(tuple_to_dict(main))

In [65]: df
Out[65]:
         date oldname newname icon_url
0  2020-04-21     Tap  Tapnew        3
1  2020-04-21     Nod  Nodnew        4
2  2020-04-21     Mik  Miknew        5
Rajat Mishra
  • 3,635
  • 4
  • 27
  • 41
1

From df = pd.DataFrame(main) it's just pivot on two columns (more about pivot here):

(pd.DataFrame(main, columns=['col','val'])
   .assign(idx=lambda x: x.groupby('col').cumcount())
   .pivot('idx','col','val')
)

Output:

col        date icon_url newname oldname
idx                                     
0    2020-04-21        3  Tapnew     Tap
1    2020-04-21        4  Nodnew     Nod
2    2020-04-21        5  Miknew     Mik
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1
s=a.melt()
s['i']=s.groupby(0).cumcount()
s=s.pivot(index='i',columns=0,values='value')
0        date icon_url newname oldname
i                                     
0  2020-04-21        3  Tapnew     Tap
1  2020-04-21        4  Nodnew     Nod
2  2020-04-21        5  Miknew     Mik
halfer
  • 19,824
  • 17
  • 99
  • 186
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Read in the DataFrame as you have. Then create an index for groups of data by checking where the word is 'date' and taking the cumsum. At this point we just pivot

df = pd.DataFrame(main)
df['index'] = df[0].eq('date').cumsum()
df = df.pivot(index='index', columns=0, values=1).rename_axis(None, axis=1)

             date icon_url newname oldname
index                                     
1      2020-04-21        3  Tapnew     Tap
2      2020-04-21        4  Nodnew     Nod
3      2020-04-21        5  Miknew     Mik
ALollz
  • 57,915
  • 7
  • 66
  • 89
0

This solution uses Python and should be more efficient for large data. It makes use of the fact that Python lists are ordered and you are looking at parsing tuples in the group of 4 (variable n in the solution)

main =  [('date', '2020-04-21'),  ('oldname', 'Tap'),  ('newname', 'Tapnew'),  ('icon_url',   '3'),  ('date', '2020-04-21'),  ('oldname', 'Nod'),  ('newname', 'Nodnew'),  ('icon_url','4'),  ('date', '2020-04-21'),  ('oldname', 'Mik'),  ('newname', 'Miknew'),  ('icon_url','5')]

n = 4

# Extract column names
main_columns = [item[0] for item in main[:n]]
# Extract values
main_values = [item[1] for item in main]
# Reshape values to return list of lists
main_reshaped = [main_values[(i-1)*n:(i*n)] for i in range(1, len(main_values)//4+1)]

# Call DataFrame constructor
pd.DataFrame(main_reshaped, columns = main_columns)


    date        oldname newname icon_url
0   2020-04-21  Tap     Tapnew  3
1   2020-04-21  Nod     Nodnew  4
2   2020-04-21  Mik     Miknew  5
Vaishali
  • 37,545
  • 5
  • 58
  • 86
0

u could use defaultdict to get a dictionary, then read into a dataframe:

from collections import defaultdict
d = defaultdict(list)
for k,v in main:
    d[k].append(v)

pd.DataFrame(d)

date    oldname newname icon_url
0   2020-04-21  Tap Tapnew  3
1   2020-04-21  Nod Nodnew  4
2   2020-04-21  Mik Miknew  5
sammywemmy
  • 27,093
  • 4
  • 17
  • 31