0

I have a situation,there are 3 columns(x,y,point) in my dataframe and I am copying id table from another df,but problem is that all columns have different values in them .I want to copy id column with point col such that point 1,2,3 have same id. df is

x           y                 point
76.32       42.58              1
86.34       90.02              2
687.12      125.327            1
65.28       87.11              2
963.1       854.2              3

id colum from df1
M123
P478
Q145
J768
D476

expected output is

x           y                 point     id
    76.32       42.58              1    M123
    86.34       90.02              2    M123
    687.12      125.327            1    P478
    65.28       87.11              2    P478
    963.1       854.2              3    P478
    85.23       96.23              1    Q145

I am trying this code

df1["id"]=  df["id"]

and output is

x           y                 point     id
    76.32       42.58              1    M123
    86.34       90.02              2    P478
    687.12      125.327            1    Q145
    65.28       87.11              2    j768
    963.1       854.2              3    D476
    85.23       96.23              1    P145

there are hundreds of rows in df so i cannot give row position. Is there any way to do it? any help will be appreciated

user19
  • 93
  • 9

1 Answers1

1

If each group starting by 1 then is possible use Series.map by dictionary, each group is created compare by 1 with Series.eq and Series.cumsum:

df['id'] = df['point'].eq(1).cumsum().map(dict(enumerate(df1['id'], 1)))
print (df)
        x        y  point   id
0   76.32   42.580      1  111
1   86.34   90.020      2  111
2  687.12  125.327      1  222
3   65.28   87.110      2  222
4  963.10  854.200      3  222
5   85.23   96.230      1  333

Detail:

print (df['point'].eq(1).cumsum())
0    1
1    1
2    2
3    2
4    2
5    3
Name: point, dtype: int32

Or if necessary check difference between groups:

df['id'] = df['point'].diff().lt(0).cumsum().map(dict(enumerate(df1['id'])))
print (df)
        x        y  point   id
0   76.32   42.580      1  111
1   86.34   90.020      2  111
2  687.12  125.327      1  222
3   65.28   87.110      2  222
4  963.10  854.200      3  222
5   85.23   96.230      1  333

print (df['point'].diff().lt(0).cumsum())
0    0
1    0
2    1
3    1
4    1
5    2
Name: point, dtype: int32
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @user19 - orignal DataFrame is `df`, new is `df1` with column `id` in my answer. There is some problem? – jezrael Jan 10 '20 at 09:39
  • @user19 - Can you changed data in question like comment above? – jezrael Jan 10 '20 at 09:51
  • @user19 - Solution in my answer working for changed data same very well – jezrael Jan 10 '20 at 09:59
  • @user19 - What is `dict(enumerate(df1['id'].head(), 1))` ? – jezrael Jan 10 '20 at 10:05
  • It should be `print (dict(enumerate(df1['id'].head(), 1))) {1: 'M123', 2: 'P478', 3: 'Q145', 4: 'J768', 5: 'D476'}` – jezrael Jan 10 '20 at 10:06
  • @ jezrael if i try to add prefix with column value, df['id'] = df['point'].eq(1).cumsum().map(dict(enumerate("AB_"+df1['id'].astype(str), 1))) it gives output as AB_111.0, AB_222.o. do you have any idea why it is adding ".0" ? – user19 Jan 10 '20 at 11:55
  • 1
    @user19 - first try change `.map(dict(enumerate("AB_"+df1['id'].astype(str), 1)))` to `.map(dict(enumerate("AB_"+df1['id'].astype(int).astype(str), 1)))`, if not working try `.map(dict(enumerate("AB_"+df1['id'].astype('Int64').astype(str), 1)))` - last working in pandas 0.25+ – jezrael Jan 10 '20 at 11:57
  • it is throwing following error "Cannot convert non-finite values (NA or inf) to integer" although there are not NAN values – user19 Jan 10 '20 at 12:41
  • @user19 - You can count missing values by `df1['id'].isna().sum()` – jezrael Jan 10 '20 at 12:44
  • @user19 - `.map(dict(enumerate("AB_"+df1['id'].astype('Int64').astype(str), 1)))` not working? – jezrael Jan 10 '20 at 12:44
  • @user19 - Never seen error before, maybe some data related issue? – jezrael Jan 10 '20 at 13:12
  • If use `"AB_"+df["id"].astype(str).map(str))+("_"+ df1["point"].astype(str)` it working? – jezrael Jan 10 '20 at 13:14
  • hmm, not missing last `)` or `))` ? – jezrael Jan 10 '20 at 13:25