0

Don't know how to do tolist() on one column without removing other columns.

I have 3 columns

category      |        item         |    subcategory

Construction  |  [28, 0, 72168025]  |     tools

I want to unpack df.item so ill get:

category      |  name   |  price  |   view     |  subcategory

Construction  |   28    |    0    |  72168025  |    tools

i did:

df = pd.DataFrame(df.item.tolist(), columns=['Name', 'Price', 'View']) 

but i get:

|  name   |  price  |   view     |

|   28    |    0    |  72168025  |  

How to include other columns to df.

Deskom88
  • 39
  • 6

4 Answers4

1

If your column consists of a list on each row, you don't need .tolist() to cast this into separate columns.

Here is one possible approach to resolve this

Generate some dummy data

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(10,1), columns=list('A'))
df['category'] = 'Construction'
df['item'] = [[28,0,72168025]]*df.shape[0]
df['subcategory'] = 'tools'
print(df)
          A      category               item subcategory
0  0.972818  Construction  [28, 0, 72168025]       tools
1  0.583059  Construction  [28, 0, 72168025]       tools
2  0.784836  Construction  [28, 0, 72168025]       tools
3  0.393868  Construction  [28, 0, 72168025]       tools
4  0.806041  Construction  [28, 0, 72168025]       tools
5  0.871041  Construction  [28, 0, 72168025]       tools
6  0.573951  Construction  [28, 0, 72168025]       tools
7  0.513052  Construction  [28, 0, 72168025]       tools
8  0.982331  Construction  [28, 0, 72168025]       tools
9  0.713301  Construction  [28, 0, 72168025]       tools

Now, use apply(pd.Series) on the item column (per this SO post or this one) to get a separate dataframe, where each list element corresponds to a separate pandas Series, and assign column names

df_split = df['item'].apply(pd.Series)
df_split.columns = ['Name', 'Price', 'View']

Finally concatenate the original dataframe with the new (split) dataframe

df = pd.concat([df, df_split], axis=1)

Output

print(df)
          A      category               item subcategory  Name  Price      View
0  0.684692  Construction  [28, 0, 72168025]       tools    28      0  72168025
1  0.404291  Construction  [28, 0, 72168025]       tools    28      0  72168025
2  0.084463  Construction  [28, 0, 72168025]       tools    28      0  72168025
3  0.060698  Construction  [28, 0, 72168025]       tools    28      0  72168025
4  0.096269  Construction  [28, 0, 72168025]       tools    28      0  72168025
5  0.539278  Construction  [28, 0, 72168025]       tools    28      0  72168025
6  0.159661  Construction  [28, 0, 72168025]       tools    28      0  72168025
7  0.651479  Construction  [28, 0, 72168025]       tools    28      0  72168025
8  0.961392  Construction  [28, 0, 72168025]       tools    28      0  72168025
9  0.741887  Construction  [28, 0, 72168025]       tools    28      0  72168025

(Optional) Drop the original item column

df.drop(['item'], axis=1, inplace=True)
print(df)
          A      category subcategory  Name  Price      View
0  0.833281  Construction       tools    28      0  72168025
1  0.229584  Construction       tools    28      0  72168025
2  0.403571  Construction       tools    28      0  72168025
3  0.822803  Construction       tools    28      0  72168025
4  0.968666  Construction       tools    28      0  72168025
5  0.053424  Construction       tools    28      0  72168025
6  0.759824  Construction       tools    28      0  72168025
7  0.766610  Construction       tools    28      0  72168025
8  0.752378  Construction       tools    28      0  72168025
9  0.056715  Construction       tools    28      0  72168025

Edit: although this approach is feasible, there are faster approaches than using apply - see here.

edesz
  • 11,756
  • 22
  • 75
  • 123
1

join + pop

Join a dataframe constructed from a list of lists:

df = df.join(pd.DataFrame(df.pop('item').values.tolist()).add_prefix('item'))

Example

# data from @cronoik
data = [('Construction',[28,0,7216825], 'tools')]
labels = ['category', 'item', 'subcategory']
df = pd.DataFrame.from_records(data, columns=labels)

df = df.join(pd.DataFrame(df.pop('item').values.tolist()).add_prefix('item'))

print(df)

       category subcategory  item0  item1    item2
0  Construction       tools     28      0  7216825
jpp
  • 159,742
  • 34
  • 281
  • 339
0

You could use your original solution and join the result onto your original DataFrame, dropping the original column:

df2 = pd.DataFrame(df.item.tolist(), columns=['Name', 'Price', 'View'], index=df.index)
final_df = df.join(df2).drop("item", axis=1) 
Sven Harris
  • 2,884
  • 1
  • 10
  • 20
0

You get this result because you create a new dataframe from your item column. What you actually want to do is to add new columns to your existing dataframe:

import pandas as pd
data = [('Construction',[28,0,7216825], 'tools')]
labels = ['category', 'item', 'subcategory']
df = pd.DataFrame.from_records(data, columns=labels)

#Adding the new columns based on the split
df[['name','price', 'view']] = pd.DataFrame(df.item.tolist())
#dropping the unneeded item column
df.drop('item', axis=1, inplace=True )
cronoik
  • 15,434
  • 3
  • 40
  • 78