0

I have a dataframe with some reference values:

ref_value = {A:111, B:222, C:333, D:444}
df = pd.DataFrame(ref_value)

|foo|bar|
|---|---|
|A  |111|
|B  |222|
|C  |333|
|D  |444|

I then want to use it to create a bigger dataset to look like this:

| x | y |
|---|---|
|1  |111|
|2  |111|
|3  |111|
|4  |111|
|5  |111|
|6  |222|
|7  |222|
|8  |222|
|9  |222|
|10 |222|
|11 |333|
|12 |333|
|13 |333|
|14 |333|
|15 |333|
|16 |444|
|17 |444|
|18 |444|
|19 |444|
|20 |444|

This is how I've done it:

new_df = pd.DataFrame(np.arange(20))

new_df.loc[new_df.x <= 5, 'y'] = df.loc[df.foo == 'A', 'bar'].iloc[0]
new_df.loc[(new_df.x > 5) & (new_df.x <= 10), 'y'] = df.loc[df.foo == 'B', 'bar'].iloc[0]
new_df.loc[(new_df.x > 10) & (new_df.x <= 15), 'y'] = df.loc[df.foo == 'C', 'bar'].iloc[0]
new_df.loc[new_df.x > 15, 'y'] = df.loc[df.foo == 'D', 'bar'].iloc[0]

May I have some suggestions on how to do this more easily/elegantly?

I can't seem to get df.apply(lambda x: function) to work in this case. NB: The actual DataFrames I'm working with are a bit larger.

Thanks in advance.

JezTong
  • 3
  • 2

3 Answers3

0

I think that you can use pd.Series.apply as follows:

import pandas as pd
import numpy as np
def replace_values(v, ref_values):
    if v <= 5:
        return ref_values["A"]
    elif 5 < v <= 10:
        return ref_values["B"]
    elif 10 < v <= 15:
        return ref_values["C"]
    elif 15 < v:
        return ref_values["D"]
    return np.nan

if __name__ == '__main__':
    ref_value = {"A":111, "B":222, "C":333, "D":444}
    new_df = pd.DataFrame({"x": range(1,21)})
    new_df["y"] = new_df["x"].apply(lambda v: replace_values(v, ref_value))
    print(new_df)

Result:

     x    y
0    1  111
1    2  111
2    3  111
3    4  111
4    5  111
5    6  222
6    7  222
7    8  222
8    9  222
9   10  222
10  11  333
11  12  333
12  13  333
13  14  333
14  15  333
15  16  444
16  17  444
17  18  444
18  19  444
19  20  444
abysslover
  • 683
  • 5
  • 14
  • I was originally looking for a solution like this but the one by BENY solved it with a more simple solution. Thanks – JezTong Apr 16 '21 at 21:53
0

This should work:

df = df['bar'].repeat(5).reset_index(drop=True)
df.index = df.index+1
df = df.reset_index().set_axis(['x','y'],axis=1)
rhug123
  • 7,893
  • 1
  • 9
  • 24
0

Try with create the mapping dataframe with cut , notice you can set the cut point by change the bins

mapping = pd.DataFrame({'newvalue' : range(1,21),
                        'foo':pd.cut(range(1,21),bins=[0,5,10,15,20],
                              labels=['A','B','C','D'])})
out = mapping.merge(df)
Out[53]: 
    newvalue foo  bar
0          1   A  111
1          2   A  111
2          3   A  111
3          4   A  111
4          5   A  111
5          6   B  222
6          7   B  222
7          8   B  222
8          9   B  222
9         10   B  222
10        11   C  333
11        12   C  333
12        13   C  333
13        14   C  333
14        15   C  333
15        16   D  444
16        17   D  444
17        18   D  444
18        19   D  444
19        20   D  444
BENY
  • 317,841
  • 20
  • 164
  • 234