0

I have a pandas dataframe containing columns extracted from a csv file. One of the columns has strings which contain a particular number, I want to extract. It showed a TypeError, which I think was because of the object data type of the column, but neither does setting data type of the column during extraction work, nor does astype work on the particular column. Earlier, I extracted the same column from the Excel file, and the regex expression worked on it, no problem.

The working head is as follows:

  Transaction Date                                        PARTICULARS DEPOSITS WITHDRAWALS Amount Dr/Cr  Calc_Amount  Calc RRN Number RRN-AMT
0       2019-05-30              UPI/914923281641/UPI/raghu.m.v2016@o/        0       32.86  32.86    Dr        32.86     914923281641    0100
1       2019-05-30              UPI/915000512028/UPI/hemanth1999kuma/        0        0.95   0.95    Dr         0.95     915000512028    0100
2       2019-05-30          UPI/RVSL915000512028/UPI/hemanth1999kuma/     0.95           0   0.95    Cr        -0.95     915000512028    0100
3       2019-05-30  UPI/914923451855/UPI/tpmanzoor55@okh/Federal Bank     1.19           0   1.19    Cr        -1.19     914923451855    0100
4       2019-05-30              UPI/914923339262/UPI/ravimaurya8735@/        0        0.94   0.94    Dr         0.94     914923339262    0100

From this code:

for i, row in bank_statement_30May.iterrows():
    result = [e for e in re.split("[^0-9]",row[1]) if e != '']
    bank_statement_30May.loc[i,"Calc RRN Number"] = max(map(int,result))

This is the error from the second code:

    result = [e for e in re.split("[^0-9]",row[1]) if e != '']
  File "C:\Users\Suraj Joshi\AppData\Local\Programs\Python\Python37\lib\re.py", line 213, in split
    return _compile(pattern, flags).split(string, maxsplit)
TypeError: cannot use a string pattern on a bytes-like object
Rahul
  • 576
  • 1
  • 5
  • 9
  • Sorry, for not being clear, no, I only want the number inside, as you can see sometimes the thing between the UPIs has other stuff as well. I want only the numbers which can be seen on the right most end of the first code block. I assumed these would be the largest numbers throughout anyway, so I used that logic. – Suraj Joshi Jun 26 '19 at 06:51
  • okay, so you already have a column with the largest data, now what do you want to do with `Calc RRN Number` column? – anky Jun 26 '19 at 06:53
  • Nothing, I just want that column. I don't 'already' have it, because this was just a sample data I was working on. I have other csv files to put through this, but if it doesn't work for the sample then it won't work for any of them. – Suraj Joshi Jun 26 '19 at 06:55
  • so `df['new_col']=df.PARTICULARS.str.extract('(\d+)',expand=False)` ? – anky Jun 26 '19 at 06:56
  • Yes, this worked for me, thanks a lot! But I also want to work with the new column as a string (concatenate, extract particular digits, etc.) but the data type of the column is object. Will casting work? – Suraj Joshi Jun 26 '19 at 07:03
  • `.astype(float)` should work. since there might be `NaN` , if not , `s=df.PARTICULARS.str.extract('(\d+)',expand=False)` and then `pd.to_numeric(s,errors='coerce')` – anky Jun 26 '19 at 07:04
  • 1 min, you said you want them as strings? then the col dtype should be `object`. I am closing this since this is a dupe. Let me know if you need any help – anky Jun 26 '19 at 07:05
  • 1
    Yes, it is working fine as it is. Thank you! – Suraj Joshi Jun 26 '19 at 07:07

1 Answers1

0

Do you mean by?:

bank_statement_30May['Calc RRN Number'] = bank_statement_30May['Calc RRN Number'].astype(str).applymap(lambda x: int(max(x, key=int)))
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • Yes, this didn't change anything for me. I read some other answers advising me to change to ('str') but that didn't work either. I don't want to change the Calc RRN Number to string, I want to change the PARTICULARS to strings so that the regex doesn't show typeerror – Suraj Joshi Jun 26 '19 at 06:54
  • @SurajJoshi After this line do: `print(bank_statement_30May)` – U13-Forward Jun 26 '19 at 06:54