6

Good day,

I have a dataframe where I want to isolate a part of the string for each row for that column. The problem I am having is that each row needs to have a substring of a different length, specifically I want to keep the string only up until the first occurs of "." (a period) plus the next two letters.

Example:

import pandas as pd

x = [ [ 34, 'Sydney.Au123XX'] ,
             [30, 'Delhi.As1q' ] ,
             [16, 'New York.US3qqa']]
x = pd.DataFrame(x)
x.columns = ["a", "b"]

#now I want to substring each row based on where "." occurs.
#I have tried the following:
y = x["b"].str.slice( stop = x["b"].str.find(".") + 2)
y = x["b"].str[0: x["b"].str.find(".")+ 2]

#desired output
desired = [[ 34, 'Sydney.Au'] ,
             [30, 'Delhi.As' ] ,
             [16, 'New York.US'] ]
desired  = pd.DataFrame(desired )
desired .columns = ["a", "b"] 

Please see my code for the desired output.

I do not want to use a loop.

Thanks in advance.

rich
  • 520
  • 6
  • 21
  • Why does New York have stuff after the `.` – U13-Forward Jul 26 '19 at 07:44
  • @U10-Forward. Thank you, I did see your answer, and it was correct for the first version of my question. Please see updates to my question. Can str.split() work for the case where I wish to keep the first two characters after the "." as well? – rich Jul 26 '19 at 07:58
  • @U10-Forward. I think for my revised question I could keep x['b'].str.split('.').str[1] to keep the first two letters after ".". Is there a better way? – rich Jul 26 '19 at 08:00
  • Possible duplicate of [How to remove numbers from string terms in a pandas dataframe](https://stackoverflow.com/questions/41719259/how-to-remove-numbers-from-string-terms-in-a-pandas-dataframe) – Georgy Jul 26 '19 at 08:05

2 Answers2

4

IIUC try:

x['b'] = x['b'].str.split('.').str[0]
print(x)

Also you can do an one-liner:

print(x.assign(b=x['b'].str.split('.').str[0]))

They both output:

    a         b
0  34    Sydney
1  30     Delhi
2  16  New York

Edit:

Do:

x['b'] = x['b'].str.extract('(.*\...)')
print(x)

Or use:

print(x.assign(b=x['b'].str.extract('(.*\...)')))
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

Using list comprehension

Ex.

import pandas as pd

x = [ [ 34, 'Sydney.Au123'] ,
             [30, 'Delhi.As1' ] ,
             [16, 'New York.US3']]

data = [["{0}.{1}".format(i.split(".")[0],i.split(".")[1][0:2]) if isinstance(i,str) else i for i in y] for y in x ]
df  = pd.DataFrame(data,columns=['a','b'])
print(df)

O/P:

    a            b
0  34    Sydney.Au
1  30     Delhi.As
2  16  New York.US
bharatk
  • 4,202
  • 5
  • 16
  • 30