1

this is my first question at stackoverflow.

I have two dataframes of different sizes df1(266808 rows) and df2 (201 rows). df1

and

df2

I want to append the count of each value/number in df1['WS_140m'] to df2['count'] if number falls in a class interval given in df2['Class_interval'].

I have tried

1)

df2['count']=pd.cut(x=df1['WS_140m'], bins=df2['Class_interval'])

2)

df2['count'] = df1['WS_140m'].groupby(df1['Class_interval'])

3)

for anum in df1['WS_140m']:
    if anum in df2['Class_interval']:
        df2['count'] = df2['count'] + 1

Please guide, if someone knows.

  • Can you show a generic input example and expected output? – Jeroen Apr 13 '20 at 08:23
  • @Jeroen for example if df1['WS_140m'] has 327 values in class_interval (0.05, 0.15] and 400 values in class_interval (0.15, 0.25] the output of df2 should look like this in df2['count'] and so on for each class interval for df2. example in link. https://drive.google.com/file/d/1S7GwYGxIZe-58s0KQqStpkC6oATp6twL/view?usp=sharing – Arslan Adeel Ur Rehman Apr 13 '20 at 08:46

2 Answers2

0

Please try something like:


def in_class_interval(value, interval):
    #TODO
def in_class_interval_closure(interval):
   return lambda x: in_class_interval(x, interval)

df2['count'] = df2['Class_interval']
      .apply(lambda x: df1[in_class_interval_closure(x)(df1['WS_140m'])].size,axis=1)

Define your function in_class_interval(value, interval), which returns boolean.

Artur Kasza
  • 356
  • 1
  • 9
0

I guess something like this would do it:

In [330]: df1                                                                                                                                                                                               
Out[330]: 
   WS_140m
0     5.10
1     5.16
2     5.98
3     5.58
4     4.81

In [445]: df2                                                                                                                                                                                               
Out[445]: 
   count Class_interval
0      0            NaN
1      0    (0.05,0.15]
2      0    (0.15,0.25]
3      0    (0.25,0.35]
4      0    (3.95,5.15]

In [446]: df2.Class_interval = df2.Class_interval.str.replace(']', ')')

In [451]: from ast import literal_eval
In [449]: for i, v in df2.Class_interval.iteritems(): 
     ...:     if pd.notnull(v): 
     ...:         df2.at[i, 'Class_interval'] = literal_eval(df2.Class_interval[i]) 

In [342]: df2['falls_in_range'] = df1.WS_140m.between(df2.Class_interval.str[0], df2.Class_interval.str[1])                                                                                                 

You can increase the count wherever True comes like below :

In [360]: df2['count'] = df2.loc[df2.index[df2['falls_in_range'] == True].tolist()]['count'] +1                                                                                                             

In [361]: df2                                                                                                                                                                                               
Out[361]: 
   count Class_interval  falls_in_range
0    NaN            NaN           False
1    NaN   (0.05, 0.15)           False
2    NaN   (0.15, 0.25)           False
3    NaN   (0.25, 0.35)           False
4    1.0   (3.95, 5.15)            True
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58