0

I have a pandas dataFrame with in one of the columns (df['data']) the following data:

[{'validFrom': '2009-02-16', 'validTo': None, 'country': ['NL', 'BE', 'US'],
'model': ['Free']}]

I tried to extract the different values using regex:

df.['data'].str.extract(r"\'validFrom\': \'(.*?)\',")

When I test this in a online regex tester it works, but when I try it in my script it returns NaN
I basically want to extract the values for all fields (validFrom, validTo, country and model).



Example dataframe, the [..] equals the above mentioned data.

|----------------|-------------|-------------|------------------|
|      code      |     name    |      type   |     data         |
|----------------|-------------|-------------|------------------|
|      003       |     WMG     |      other  |      [..]        |



What am I doing wrong?

Claudine
  • 23
  • 1
  • 6
  • 1
    Can you show an example on how the dataframe looks like? It looks like you have a dict in the df? Not a string? – LeoE Feb 15 '20 at 18:09
  • You are trying to apply a regex to a dictionary ? – kpie Feb 15 '20 at 18:13
  • @LeoE i've added the table, wasn't sure how to format it. For now the dataframe is just 1 row – Claudine Feb 15 '20 at 18:25
  • The important part is missing... What exactly is `'data'`? Is it a dict or a string? – LeoE Feb 15 '20 at 18:27
  • Thanks for the reference to the other question. I didn't consider it as a dictionary. By using `json_normalize` I found a solution :) – Claudine Feb 15 '20 at 18:38

0 Answers0