I'm trying to convert a pandas column from string to a set so I can perform set
operations (-
) and methods (.union
) between two datafame on two set_array columns. The files are imported from two csv file with a set_array column. However, once I run pd.read_csv
in pandas, the columns type becomes str
, which prevents me from doing set operations and methods.
csv1:
set_array
0 {985,784}
1 {887}
2 set()
3 {123,469,789}
4 set()
After loading csv1 into a DataFrame using df = pd.read_csv(csv1)
, the data type becomes str
, and when I try to call the first index using df['set_array'].values[0]
, I get the following:
'{985, 784}'
However, if I were to create my own DataFrame with a set
column using df1 = pd.DataFrame({'set_array':[{985, 784},{887},{},{123, 469, 789},{}]})
, and call the first index again using df['set_array'].values[0]
, I get the following (Desired output):
{985, 784} <-without the ''
Here is what I tried so far:
1) df.replace('set()', '') <-removes the set() portion from df
2) df['set_array'] = df['set_array'].apply(set) <-does not work
3) df['set_array'] = df['set_array'].apply(lambda x: {x}) <-does not work
4) df['set_array'].astype(int) <-convert to int first then convert to set, does not work
5) df['set_array'].astype(set) <-does not work
6) df['set_array'].to_numpy() <-convert to array, does not work
I'm also thinking to change the column to set at the pd.read_csv stage as a potential solution.
Is there any way to load csv using pandas and keep the set
data type, or just simply convert the column from str
to set
so it looks like the desired output above?
Thanks!!