I'm trying to clean my dataset I scrapped the data from abjjad. I have five columns book_title, author, Cover_url, genres, and descriptions.
for the genres column, the data that I scraped has the following syntax
روايات وقصص روايات اجتماعية | روايات وقصص روايات واقعية |
here is an image of exactly how it looks in vscode
so I wanted to turn this into a list with each genre being in a separate cell. Genres are separated by a new line and by '|'. first, I used this line to remove the '|'
df = pd.read_csv("/data/abjjad.csv",converters={'genres': lambda x: x[1:-1].split('|')})
I was able to achieve this
['روايات وقصص\nروايات اجتماعية\n', '\nروايات وقصص\nروايات واقعية\n']
but the desired output is this `
['روايات وقصص' ,'روايات اجتماعية','روايات وقصص', 'روايات واقعية']
I've looked into many questions similar to mine but haven't found a solution that works for me.