0

Below is the dataframe. PIC_1 and Wgt are strings and p.lgth and p_lgth are integers. If p_lgth is not equal to 30, I want to find 42 in PIC_1 and grab 42 and the 15 digits that come after it.

                                            PIC_1  Wgt  p.lgth  p_lgth
**PARTIAL-DECODE***P / 42011721930018984390078...  112      53      53

So the output from above should be 42011721930018984

My code that does not work follows:

def pic_mod(row):
 if row['p_lgth'] !=30:
    PIC_loc = row['PIC_1'].find('42')
    PIC_2 = row['PIC_1'].str[PIC_loc:PIC_loc + 15]
 elif row['p_lgth']==30:
    PIC_2=PIC_1  
 return PIC_2

row_1 is just a row from the larger df that is identical to the example row given above

 row_1 = df71[2:3]
 pic_mod(row_1)

 ValueError: The truth value of a Series is ambiguous. Use a.empty, 
 a.bool (), a.item(), a.any() or a.all().

I did type() on the variables and got

  type(df71['PIC_1']) = pandas.core.series.Series
  type(df71['p_lgth']) = pandas.core.series.Series
  type(df71['Wgt']) = pandas.core.series.Series

I'm fairly new to Python. Should these data types come back as int and str? df71 is a df.

M Sanders
  • 145
  • 2
  • 9

1 Answers1

0

According to the error message in your post, perhaps try with this one:

def pic_mod(row):
 if row['p_lgth'].any() != 30:
    PIC_loc = row['PIC_1'].str.find('42')[0]
    PIC_2 = row['PIC_1'].str[PIC_loc:PIC_loc + 17]
 elif row['p_lgth'].any() == 30:
     PIC_2=PIC_1  
 return PIC_2

However, if your data is already structured in a pandas dataframe, you normally wouldn't write such an explicit function.

E.g. the initial filtering of all rows in the dataset by p_legth not equal to 30 would be a single line like:

df_fltrd = df[df['p_lgth']!=30]

Having this done you could apply any arbitrary function to the entries in the PIC_1-column, e.g. in your case the substring of length 17 starting with '42':

df_fltrd['PIC_1'].apply(lambda x: x[x.find('42'):x.find('42')+17])
SpghttCd
  • 10,510
  • 2
  • 20
  • 25
  • Hmm, sorry... I tested with this DataFrame: `df = pd.DataFrame()` `df['PIC_1'] = ['**PARTIAL-DECODE***P / 42011721930018984390078...']` `df['Wgt'] = [112]` `df['p_lgth'] = [53]` `df['p.lgth'] = [53]` `row_1 = df[:1]` `pic_mod(row_1)` and this leads to the result `Out[67]: 0 420117219300189 Name: PIC_1, dtype: object` – SpghttCd Apr 24 '18 at 19:40