How would I remove multiple values within one column in Python

Question

I have a file named df that looks like this:

Size       ID      File
500 TB     A       200 TB 
200 TB     B       100 TB
600 TB     C       300 TB

The numerical value along with the text, which is always 'TB', are within one column. How would I transform this and remove the 'TB' text from both columns to give me the desired output of:

Size       ID      File
500        A       200 
200        B       100 
600        C       300

This is what I am doing:

import numpy as np
import pandas as pd

df = df[df[","] > ,]

I am still researching this. Any insight will be helpful.

That's addressed here https://stackoverflow.com/questions/13682044/remove-unwanted-parts-from-strings-in-a-column — DejaVuSansMono, Sep 30 '20 at 16:15

score 3 · Answer 1 · answered Sep 30 '20 at 16:16

3

Let's try str.extract:

df[['Size','File']] = df[['Size','File']].apply(lambda x: x.str.extract('^(\d+)')[0])

Output:

  Size ID File
0  500  A  200
1  200  B  100
2  600  C  300

answered Sep 30 '20 at 16:16

Quang Hoang

146,074
10
56
74

Trenton McKinney · Accepted Answer · 2020-09-30T16:25:31.410

Apply str.split to the columns with pandas.DataFrame.apply, and then select the first element from the list created by .split, with .str[0].
This will work, as long as the pattern shown in the sample is consistent, with the undesired text after the space.
Using .apply in this way, will apply the lambda function to all the columns.
- If ID has values with spaces, then this solution will cause an issue there, which can be resolved by using apply only on the columns that need to be fixed.
  - df[['Size', 'File']] = df[['Size', 'File']].apply(lambda x: x.str.split(' ').str[0])
- If there was only one column to fix, then .apply isn't required.
  - df['Size'] = df['Size'].str.split(' ').str[0]

import pandas as pd

# test dataframe
df =  pd.DataFrame({'Size': ['500 TB', '200 TB', '600 TB'], 'ID': ['A', 'B', 'C'], 'File': ['200 TB ', '100 TB', '300 TB']})

# display(df)
     Size ID     File
0  500 TB  A  200 TB 
1  200 TB  B   100 TB
2  600 TB  C   300 TB

# apply str.split and then select the value at index [0]
df = df.apply(lambda x: x.str.split(' ').str[0])

# display(df)
  Size ID File
0  500  A  200
1  200  B  100
2  600  C  300

score 2 · Answer 3 · answered Sep 30 '20 at 16:17

2

Try str.strip

df[['Size','File']] = df[['Size','File']].apply(lambda x: x.str.strip(' TB'))

answered Sep 30 '20 at 16:17

BENY

317,841
20
164
234

score 1 · Answer 4 · answered Sep 30 '20 at 16:19

1

You can use this

df['Size'] = df['Size'].map(lambda x: x.lstrip('+-').rstrip('TB'))
df['File'] = df['File'].map(lambda x: x.lstrip('+-').rstrip('TB'))

answered Sep 30 '20 at 16:19

Dexter0411

104
1
6

score 1 · Answer 5 · answered Sep 30 '20 at 16:28

1

You can also replace:

df[['Size','File']] = df[['Size','File']].replace('\s+?TB','',regex=True)

  Size ID File
0  500  A  200
1  200  B  100
2  600  C  300

answered Sep 30 '20 at 16:28

anky

74,114
11
41
70

How would I remove multiple values within one column in Python

5 Answers5