My goal is to separate data stored in cells to multiple columns in the same row.
For example, I would like to take data that looks like this:
Row 1: [<1><2>][<3><4>][][]
Row 2: [<1><2>][<3><4>][][]
Into data that looks like this:
Row 1: [1][2][3][4]
Row 2: [1][2][3][4]
I tried using the code below to pull the csv and separate each line at the ">"
df = pd.read_csv('file.csv', engine='python', sep="\*>", header=None)
However, the code did not function as anticipated. Instead, the separation occurred at seemingly random and unpredictable points (I'm sure there's a pattern but I don't see it.) And each break created another row as opposed to another column. For example:
Row 1: [<1>][<2>]
Row 2: [<3>]
Row 3: [<4>]
I thought the issue might lie with reading the CSV file so I tried just re-scraping the site with the separator included but it produced the same results so I'm assuming its an issue with the separator call. However, I found that call after trying many others that caused various errors. For example, when I tried using sep = '>'
I got the following error: ParserError: '>' expected after '"'
and when I tried sep = '\>'
, I got the following error: ParserError: Expected 36 fields in line 1106, saw 120. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
These errors sent me looking though multiple resources including this and this among others.
However, I have find no resources that have successfully demonstrated how I can separate each column within a row following the use of a '>' delimiter. If anyone knows how to do this, please let me know. Your help is much appreciated!
Update:
Here is an actual screenshot of the CSV file for a better understanding of what I was trying to demonstrate above. My end goal is to have all the data is columns I+ have data on one descriptive factor as opposed to many as they do now.