0

I have Dataframe:enter image description here

On the 16th row there is array with multiple items, how to put one item to next row and delete from previous? It should look like this:

16  76561198316667197  MW  0.13028361  11715671189
17  76561198315155019  MW  0.13529198  11715512049
18  76561198338763319  MW  0.14147347  11712963469

3 Answers3

1

Try the explode function by MaxU:

def explode(df, lst_cols, fill_value=''):
    # make sure `lst_cols` is a list
    if lst_cols and not isinstance(lst_cols, list):
        lst_cols = [lst_cols]
    # all columns except `lst_cols`
    idx_cols = df.columns.difference(lst_cols)

    # calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    if (lens > 0).all():
        # ALL lists in cells aren't empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .loc[:, df.columns]
    else:
        # at least one list in cells is empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .append(df.loc[lens==0, idx_cols]).fillna(fill_value) \
          .loc[:, df.columns]
Kyle
  • 2,814
  • 2
  • 17
  • 30
0

I am pretty sure this is not the most efficient way to do it, but it works. Assuming your First_Column consist of lists of two.

a = df['First_Column'].values

b = [a[i][0] for i in range(0, len(a))]
c = [a[i][1] for i in range(0, len(a))]

d = {'col1': b, 'col2': c}

new_df = pd.DataFrame(d)
Amir H
  • 115
  • 12
0

I break down the steps.

df = pd.DataFrame({'col1': [[5, 3], [1]], 'col2': [[7, 5], [2]], 'col3': ['MW','MW']})#sample data
df1=df[df.col1.apply(len)>1]
df2=df[df.col1.apply(len)==1]
df1=df1.set_index('col3').stack().apply(pd.Series).stack().unstack(-2).reset_index().drop('level_1',1)
df2=df2.apply(lambda x : x.values[0])
pd.concat([df1,df2],axis=0).reset_index(drop=True)
Out[533]: 
   col1  col2 col3
0     5     7   MW
1     3     5   MW
2     1     2   MW
BENY
  • 317,841
  • 20
  • 164
  • 234