1

I have a pandas.core.series.Series that looks like the below. When using type(), I see that each row is a str. I'd like to convert this series of strings into a series of arrays. The main goal is to then be able to replace these values depending on different conditions.

Example dataset: (but my real dataset has more columns and many more rows)

0   ['5 apples', '2 pears']
1   ['3 apples', '3 pears', '1 pumpkin']
2   ['4 blueberries']
3   ['5 kiwis']
4   ['1 pumpkin']
...  ...

Then, for example, if an array has the value "1 pumpkin", I'd like to replace it with "XXXX". This pandas create new column based on values from other columns / apply a function of multiple columns, row-wise was helpful for converting singular values, but I haven't been able to replace values in a series/list/array.

Desired output:

0   ['5 apples', '2 pears']
1   ['3 apples', '3 pears', 'XXX']
2   ['4 blueberries']
3   ['5 kiwis']
4   ['XXX']
...  ...
psychcoder
  • 543
  • 3
  • 14
  • your example looks like a Series of lists? – Z Li Jan 13 '21 at 21:16
  • @Z Li — I believe so? When I use type() on the entire column, it returns pandas.core.series.Series and when I use type() on a single row of that column, it turns str. – psychcoder Jan 13 '21 at 21:18

4 Answers4

0

You can use Series.to_numpy() to convert the pandas.core.series.Series to Numpy Array.

itsDV7
  • 854
  • 5
  • 12
  • thanks! A problem with this solution is that I have many other columns in my dataset. Again, the posted example is only an example of the subset of data that I'm working with. So I basically I need the dataset to look the exact same except convert each row of that specific column into array. Do you have any advice for this? – psychcoder Jan 13 '21 at 21:24
0

Not sure if I understand the question correctly, is this what you want:

import pandas as pd

x = pd.Series([['5 apples', '2 pears'],
               ['3 apples', '3 pears', '1 pumpkin'],
               ['4 blueberries'],
               ['5 kiwis'],
               ['1 pumpkin']])
['XXX' if '1 pumpkin' in l else l for l in x]
[['5 apples', '2 pears'], 'XXX', ['4 blueberries'], ['5 kiwis'], 'XXX']

if your Series x comes in str, to convert each str to list:

pd.Series([s.strip("']['").split("', '") for s in x])
Z Li
  • 4,133
  • 1
  • 4
  • 19
  • Yes, this is half what I'm looking for! However, my initial problem is that each row of that column is not already a list. It's a string. For example if I use df[that_column][0][0], then I receive '['. So I'm first having trouble converting the values of this one column into lists. Thanks in advance for your feedback! – psychcoder Jan 13 '21 at 21:33
  • you mean like one single element in that Series is `"['5 apples', '2 pears']"` ? – Z Li Jan 13 '21 at 21:39
  • Yes exactly, so if I isolate that row with df[that_column][0][0], then the output is [. Because my real dataset has more columns, I am having trouble isolating that one column to convert into a series of lists. – psychcoder Jan 13 '21 at 21:41
0

with dataframe df and column a_column

df[a_column]=df[a_column].apply(lambda x: [i.replace("1 pumpkin","XXXX") for i in x.strip("['']").split("', '")])

dream_dev
  • 1
  • 2
0

Assuming s is your Series:

s.apply( lambda x: x.strip("'[]").split("', '") )
Pablo C
  • 4,661
  • 2
  • 8
  • 24