0

My dataframe is given below

df =

index    element    data1   data2  data3 
0          M1         10      20     30
1          M1         40      50     60
2          M2         70      80     90
3          M2         100     120     130
4          M3         140     150     160
5          M3         170     180     190
6          M4         200     210     230

element_list1 = ['M1','M4',...........,'M25']
element_list2 = ['M2','M5',...........,'M26']
element_list3 = ['M3','M6',...........,'M27']

Now I want to create a new column. Value in the new column is based on the name of the element. If element belongs to list1 then select data2, list2 then data2, etc. Finally I want to achieve something like below

df =

index    element    data1   data2  data3        final
0          M1         10      20     30          10
1          M1         40      50     60          40
2          M2         70      80     90          80
3          M2         100     120     130        120
4          M3         140     150     160        160
5          M3         170     180     190        190
6          M4         200     210     230        200

My present code is given below:

df['final'] = np.nan

for a in element_list1:
    for i,j in enumerate(df['element']):
        if j==a:
            df['final'].iloc[i] = df['data1'].iloc[i]
for a in element_list2:
    for i,j in enumerate(df['element']):
        if j==a:
            df['final'].iloc[i] = df['data2'].iloc[i]
for a in element_list3:
    for i,j in enumerate(df['element']):
        if j==a:
            df['final'].iloc[i] = df['data3'].iloc[i]

Is there a simple approach than above?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mainland
  • 4,110
  • 3
  • 25
  • 56
  • 1
    Try [`numpy.select`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html) (with [`pd.Series.isin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html)), demonstration: [Mapping ranges of values in pandas dataframe](https://stackoverflow.com/a/50098393/9209546). – jpp Sep 13 '19 at 19:54
  • Could you elaborate on how to use this new function? – Mainland Sep 13 '19 at 19:56
  • @jpp I am new to Python. Could you help in constructing the criteria `criteria = [df['element'].in(element_list1), df['element'].in(element_list2), df['element'].in(element_list3)] \n values = [df['data1'], df['data2'], df['data3']] \n df['final'] = np.select(criteria, values, 0) ` Is this correct? – Mainland Sep 13 '19 at 20:02
  • @jpp I am new to Python. Could you help in constructing the criteria criteria = [df['element'].in(element_list1), df['element'].in(element_list2), df['element'].in(element_list3)] \n values = [df['data1'], df['data2'], df['data3']] \n df['final'] = np.select(criteria, values, 0) Is this correct? – Mainland Sep 13 '19 at 20:06
  • 1
    Use `isin` not `in`, see **[`pd.Series.isin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html)**, otherwise looks ok – jpp Sep 13 '19 at 20:09
  • @jpp Wow. np.select is fantastic. It produced me solution in within a second. My actual data frame has 90,000 lines. I am shocked. Thank you for this wonderful solution. My previous solution took more than 15 min and gave some error. – Mainland Sep 13 '19 at 20:13
  • @jpp can you answer my question? so others can see it. – Mainland Sep 13 '19 at 20:14
  • It's really answered in the duplicate link [here](https://stackoverflow.com/questions/50098025/mapping-ranges-of-values-in-pandas-dataframe), feel free to upvote there. So no need to put the same answer in 2 places. `isin` / `between` / `gt` / `eq` / `lt`, or any other comparison operation doesn't make the answer fundamentally different. – jpp Sep 13 '19 at 20:18
  • Try this. ```final = pd.concat([df.data1[df.element.isin(element_list1)], df.data2[df.element.isin(element_list2)], df.data3[df.element.isin(element_list3)],]) df['final'] = final``` @jpp I was in the middle of adding my solution to this question when I found that it has been closed. So, am leaving a note here. – CypherX Sep 13 '19 at 20:28

1 Answers1

1

Solution

# Make element lists
e1 = np.arange(1,26,3)
e2 = e1 + 1
e3 = e1 + 2
element_list1 = [f'M{x}' for x in e1.tolist()]
element_list2 = [f'M{x}' for x in e2.tolist()]
element_list3 = [f'M{x}' for x in e3.tolist()]

element_lists = [element_list1, element_list2, element_list3]

# drop column 'index' from the dataframe
df = df.drop(columns='index')
# process data for 'final' column
final = pd.concat([df.data1[df.element.isin(element_lists[0])], 
                   df.data2[df.element.isin(element_lists[1])],
                   df.data3[df.element.isin(element_lists[2])],])
df['final'] = final
df

Output

enter image description here

Make Data

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

df_string = """
index    element    data1   data2  data3 
0          M1         10      20     30
1          M1         40      50     60
2          M2         70      80     90
3          M2         100     120     130
4          M3         140     150     160
5          M3         170     180     190
6          M4         200     210     230
"""

df = pd.read_csv(StringIO(df_string), sep="\s+")
CypherX
  • 7,019
  • 3
  • 25
  • 37