How to place only part of string values in columns?

Question

I have the following dataframe:

import pandas as pd
import numpy as np
d={'P':['A[55]','B','C[98]"'],
   'Q':['C[89]','F[98]','K[97]'],
    'S':['B[89]',67,98],
   'id':['a','b','c']
  }
df=pd.DataFrame(data=d)

How to place only data inside [] for selected columns, i.e. "P" and "Q".

ID should be left as such.

Expected output:

P      Q       S         id
55      89   89           a
np.nan  98    np.nan      b
98      97      np.nan    c

Why is the value in the first row of 'Q' np.nan? Should it not be 89? — kerwei, Feb 27 '19 at 06:47
Possible duplicate of [Get the string within brackets in Python](https://stackoverflow.com/questions/8569201/get-the-string-within-brackets-in-python) — Serenity, Feb 27 '19 at 06:52
You can use df.apply with the function mentioned in the comment above — Tim, Feb 27 '19 at 06:53

cs95 · Accepted Answer · 2019-02-27T07:26:42.657

4

You can handle all columns with one str.extract operation if you stack the data first.

df.loc[:,'P':'S'] = (
    df.loc[:,'P':'S'].astype(str).stack().str.extract(r'\[(\d+)\]').unstack()[0])
df

     P   Q    S id
0   55  89   89  a
1  NaN  98  NaN  b
2   98  97  NaN  c

Details
First, convert the data to string and stack it,

df.loc[:,'P':'S'].astype(str).stack()

0  P     A[55]
   Q     C[89]
   S     B[89]
1  P         B
   Q     F[98]
   S        67
2  P    C[98]"
   Q     K[97]
   S        98
dtype: object

You can then extract numbers with str.extract:

df.loc[:,'P':'S'].astype(str).stack().str.extract(r'\[(\d+)\]')

       0
0 P   55
  Q   89
  S   89
1 P  NaN
  Q   98
  S  NaN
2 P   98
  Q   97
  S  NaN

Then, unstack and assign back.

df.loc[:,'P':'S'].astype(str).stack().str.extract(r'\[(\d+)\]').unstack()[0]

     P   Q    S
0   55  89   89
1  NaN  98  NaN
2   98  97  NaN

edited Feb 27 '19 at 07:26

answered Feb 27 '19 at 07:04

cs95

379,657
97
704
746

it gives syntax error when number of columns df.loc[:,'P':'Q':'S':'P1':'Q1':'S1':'P2':'P3':'Q2':'Q3':'S2':'S3'] = ( df.loc[:,'P':'Q':'S':'P1':'Q1':'S1':'P2':'P3':'Q2':'Q3':'S2':'S3'].astype(str).stack().str.extract(r'\[(\d+)\]').unstack()[0]) – panda Feb 27 '19 at 07:19
@panda Unfortunately, that isn't the correct syntax for selecting multiple columns. I suggest specifying a list of columns to `loc`. – cs95 Feb 27 '19 at 07:22
@panda Can you run just `df.loc[:,'P':'S'].astype(str).stack().str.extract(r'\[(\d+)\]').unstack()` and see if you can extract the data you need in the format? You can see the Details section of my answer for an example. – cs95 Feb 27 '19 at 07:30
df.loc[:,'P':'S','Q'].astype(str).stack().str.extract(r'\[(\d+)\]').unstack()[0] if I run it with [0] for df it gives indexing error without [0] also the same – panda Feb 27 '19 at 07:34
can you help for columns P,Q,S on df – panda Feb 27 '19 at 07:35
@panda `df.loc[:,['P', 'Q', 'S']] = ( df.loc[:,['P', 'Q', 'S']].astype(str)....)` – cs95 Feb 27 '19 at 07:45

meW · Answer 2 · 2019-02-27T07:05:01.850

3

Use extract to fetch digits and change others to NaN. You can iterate over each column, if you have more number of columns:

df.loc[:, 'P'] = df.loc[:, 'P'].str.extract('(\d+)')
df.loc[:, 'Q'] = df.loc[:, 'Q'].str.extract('(\d+)')
df.loc[:, 'S'] = df.loc[:, 'S'].str.extract('(\d+)')
df


+---+-----+----+-----+----+
|   |  P  | Q  |  S  | id |
+---+-----+----+-----+----+
| 0 | 55  | 89 | 89  | a  |
| 1 | NaN | 98 | NaN | b  |
| 2 | 98  | 97 | NaN | c  |
+---+-----+----+-----+----+

edited Feb 27 '19 at 07:05

answered Feb 27 '19 at 07:02

meW

3,832
7
27

it works instead of three lines of df.loc[:, 'P'] = df.loc[:, 'P'].str.extract('(\d+)') for column values can it be changed to singe line – panda Feb 27 '19 at 07:04

How to place only part of string values in columns?

2 Answers2