-2

I have lists inside list where I want to create a new column in my dataframe for each list:

My input can be seen below:

datalist : [[abc1, abc2, abc3, abc4, abc5],[kh1, kh2, kh3, kh4],[jpor1, jpor2, jpor3, jpor4, jpor5]]

Each item in my datalist is a column title in my existing dataframe while each index is also a column in my dataframe which contains the number where I need to extract the value for my new column.

So I want my output to be something like this:

index    abc1     abc2     abc3     abc4    abc5    abc_result
4         87        94      34       28      43         28
2         87        94      34       28      43         94
5         87        94      34       28      43         43
4         87        94      34       28      43         28
1         87        94      34       28      43         87

Because I have 3 lists inside my datalist, I want to have 3 new columns created and added to the dataframe, which are abc_result, kh_result, jpor_result (all based from the index column). I am really confused as I feel I need to make a new list in for every list in my datalist while string formatting the new column title?

So basically the new column value is based on the index column's value. If the value is 1 I want to extract the value from abc1 for abc_result, if 2 then extract the value from abc2, etc. Then another new column for kh_result which also need the value from the kh1/kh2/kh3 based on the index column value.

TC Arlen
  • 1,442
  • 2
  • 11
  • 19

1 Answers1

0

You can try this:

import pandas as pd
import numpy as np

datalist = [
    ["abc1", "abc2", "abc3", "abc4", "abc5"],
    ["kh1", "kh2", "kh3", "kh4"],
    ["jpor1", "jpor2", "jpor3", "jpor4", "jpor5"]
    ]


flat_list = [item for sublist in datalist for item in sublist]
column_groups= set([item[:-1] for sublist in datalist for item in sublist])

For row-wise operation of getting your desired results, you need to create a function:

def grabber(row,key:str):
    return(row[key + str(row['index'])])

This function grabs the value based on the index and the key of column group indicator.

All is remained is to iterate over the column group keys and generate the results. Lets assume that your data is already loaded in df:

df = <load the data>
for key in column_groups:

    df[key + '_result'] = df.apply(lambda x: grabber(x,key), axis=1)

However, this code works if your items only have their last character as number. Otherwise you need to use longest match algorithms for each sublist which you can find here.

Babak Fi Foo
  • 926
  • 7
  • 17
  • Thanks for the answer! but I am still confused on how to work on the each rows value? – stargazing Sep 09 '21 at 07:46
  • @stargazing can you provide input row and what operation you want to perform? – Babak Fi Foo Sep 09 '21 at 08:03
  • so basically the new column value is based on the index column's value. If the value is 1 I want to extract the value from abc1 for abc_result, if 2 then extract the value from abc2, etc. Then another new column for kh_result which also need the value from the kh1/kh2/kh3 based on the index column value. – stargazing Sep 09 '21 at 08:42
  • I modified the answer to your needs. Please update the question appropriately. – Babak Fi Foo Sep 09 '21 at 08:58
  • what if I want to blank the cell if the index is none to be found? for example the index is 7 but I don't have the abc7 column, thus I want to make that cell blank in the abc_result, it should be if else in the lambda function is it? – stargazing Sep 22 '21 at 06:40