put items from array list to unique rows (Pandas)

Question

I have Dataframe:

On the 16th row there is array with multiple items, how to put one item to next row and delete from previous? It should look like this:

16  76561198316667197  MW  0.13028361  11715671189
17  76561198315155019  MW  0.13529198  11715512049
18  76561198338763319  MW  0.14147347  11712963469

In these situations, you should ask yourself, "how did I get here"? Can you prevent this from happening in the first place? Generally you want a single value for each row for each column. — Kyle, Sep 01 '17 at 16:00
I didn't understand you... What do you mean? Do you have any ideas? — Nurislom Rakhmatullaev, Sep 01 '17 at 16:08
How did you read in df? Can you do that differently to prevent row 16 from having duplicate data? — Kyle, Sep 01 '17 at 16:09

score 1 · Accepted Answer · answered Sep 01 '17 at 16:36

Try the explode function by MaxU:

def explode(df, lst_cols, fill_value=''):
    # make sure `lst_cols` is a list
    if lst_cols and not isinstance(lst_cols, list):
        lst_cols = [lst_cols]
    # all columns except `lst_cols`
    idx_cols = df.columns.difference(lst_cols)

    # calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    if (lens > 0).all():
        # ALL lists in cells aren't empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .loc[:, df.columns]
    else:
        # at least one list in cells is empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .append(df.loc[lens==0, idx_cols]).fillna(fill_value) \
          .loc[:, df.columns]

Amir H · Answer 2 · 2017-09-02T09:04:57.463

0

I am pretty sure this is not the most efficient way to do it, but it works. Assuming your First_Column consist of lists of two.

a = df['First_Column'].values

b = [a[i][0] for i in range(0, len(a))]
c = [a[i][1] for i in range(0, len(a))]

d = {'col1': b, 'col2': c}

new_df = pd.DataFrame(d)

edited Sep 02 '17 at 09:04

answered Sep 01 '17 at 16:26

Amir H

115
12

score 0 · Answer 3 · answered Sep 01 '17 at 16:35

I break down the steps.

df = pd.DataFrame({'col1': [[5, 3], [1]], 'col2': [[7, 5], [2]], 'col3': ['MW','MW']})#sample data
df1=df[df.col1.apply(len)>1]
df2=df[df.col1.apply(len)==1]
df1=df1.set_index('col3').stack().apply(pd.Series).stack().unstack(-2).reset_index().drop('level_1',1)
df2=df2.apply(lambda x : x.values[0])
pd.concat([df1,df2],axis=0).reset_index(drop=True)
Out[533]: 
   col1  col2 col3
0     5     7   MW
1     3     5   MW
2     1     2   MW

put items from array list to unique rows (Pandas)

3 Answers3