re and pandas, reshaping lists

Question

I have a list of lists formatted as:

testing_set = ["001,P01", "002,P01,P02", "003,P01,P02,P09", "004,P01,P03"]

I used re to reformat the list as such:

[in] test_set1 = [ re.split(r',', line, maxsplit=5) for line in testing_set]

[out] ["001","P01"]

How can I create a dataframe where the index is (transaction_id) "001,002,003,004" and the p-values for each line are listed in the column (product_id).

Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. — MaxU - stand with Ukraine, Jul 29 '17 at 18:46

DJK · Accepted Answer · 2017-07-29T19:48:12.653

0

This can be done like this,

testing_set = ["001,P01","002,P01,P02","003,P01,P02,P09","004,P01,P03"]

test_set1 = [re.split(r',', line, maxsplit=1) for line in testing_set]
#change maxsplit to 1______________________^

df =pd.DataFrame(test_set1,columns=['transaction_id','product_id'])
df.set_index(['transaction_id'],inplace=True)
df['product_id'] = df['product_id'].apply(lambda row: row.split(','))

Which gives you a Dataframe like this

                     Product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]

edited Jul 29 '17 at 19:48

answered Jul 29 '17 at 18:36

DJK

8,924
4
24
40

how can I further split it so that each P-value is a seperate string, but still on the same line? so that 002 would have two Product_Id strings instead of one? Also how can I label the index as "transaction_id"? – zsad512 Jul 29 '17 at 19:03
There is a typo in 'code'(df.set_idex(['transaction_id'],inplace=True])) being as there is an extra ] but the code worked, thank you! Now- I have to create a matrix based on this Dataframe, with 1's if a product is in a particular basket and 0's otherwise (for columns "P1-P10") do you know how I could do that? – zsad512 Jul 29 '17 at 19:47
1

That's really another question entirely, I would look into one [hot encoding](http://pbpython.com/categorical-encoding.html) – DJK Jul 29 '17 at 19:54

re and pandas, reshaping lists

1 Answers1