-3

I have sample strings, and I have to extract the year from pandas Dataframe. I am unsure of how to do it? I tried using the pandas extract method using regular expression but I am unsuccessful.

Input:

Césio 137 - O Pesadelo de Goiânia (1990)

Nattbuss 807 (1997)

Νόμος 4000 (1962)

Output:

1990

1997

1962

I have tried using the following regex: \d\d\d\d

But in the expression, Νόμος 4000 (1962), I am not getting my expected result. I want to only extract 1962, not 4000.

I am aiming to extract the year from the expressions given.

Thanks in advance.

SAI SRIKAR
  • 17
  • 7
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). “Show me how to solve this coding problem” is not a Stack Overflow issue. We expect you to make an honest attempt, and *then* ask a *specific* question about your algorithm or technique. Stack Overflow is not intended to replace existing documentation and tutorials. Asking for tutorial references or personal help is off-topic here. – Prune Feb 05 '21 at 18:32
  • Please give a [mre] of what was unsuccessful. You can also read https://stackoverflow.com/q/4736/3001761 – jonrsharpe Feb 05 '21 at 18:32
  • My sincere apologies. I have updated my issue. I shall make sure that it won't be repeated. – SAI SRIKAR Feb 05 '21 at 18:40
  • From the duplicate `df['col'].str.extract('.*\((.*)\).*')` if the duplicate answer, is helpful, be sure to upvote it. – Trenton McKinney Feb 05 '21 at 19:44

2 Answers2

0

It is a very simple regex.

df = pd.read_csv(io.StringIO("""Césio 137 - O Pesadelo de Goiânia (1990)
Nattbuss 807 (1997)
Νόμος 4000 (1962)"""), names=["input"])

myre = re.compile(".*\(([0-9]+)\).*")
df.assign(output=df.input.str.extract(myre))

output

input output
0 Césio 137 - O Pesadelo de Goiânia (1990) 1990
1 Nattbuss 807 (1997) 1997
2 Νόμος 4000 (1962) 1962
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
0

This should help:

strings = ["Césio 137 - O Pesadelo de Goiânia (1990)", "Nattbuss 807 (1997)", "Νόμος 4000 (1962)"]

for string in strings:
    str = string.split(" ")
    last = len(str) - 1
    print(str[last].replace("(","").replace(")",""))

Result:

1990
1997
1962
Joaquín
  • 350
  • 2
  • 12