I have a data frame that looks like this:
Statement Standard A Standard B Standard C
Statement 1 A1 B-1.2.3, 1.2.4
Statement 1 A1 C2, 3, 4
Statement 2 A2 C5
Statement 3 A3 B-1.2.5
Statement 3 A3 C6,7
{
"Statement": ["Statement 1", "Statement 1", "Statement 2",
"Statement 3", "Statement 3",],
"Standard A": ["A1", "A1", "A2", "A3", "A3"],
"Standard B": ["B-1.2.3, 1.2.4", np.nan, np.nan, "B-1.2.5", np.nan],
"Standard C": [np.nan, "C2, 3, 4 ", "C5 ", np.nan, "C6,7"],
}
Basically what I need to do is convert it to this:
Statement Standard A Standard B Standard C
Statement 1 A1 B-1.2.3 C2
Statement 1 A1 B-1.2.4 C3
Statement 1 A1 C4
Statement 2 A2 C5
Statement 3 A3 B-1.2.5 C6
Statement 3 A3 C7
To help clarify, there are 3 Standards. Each standard has a number of standard "references" that map to the "statements".
So standards B-1.2.3 and B-1.2.4 (from Standard B) and C2, C3, and C4 (from Standard C) map to Statement 1. A1 (from Standard A) does as well, but Standard A is already fully mapped.
Right now I have three issues:
The standard prefixes ("B-" and "C") are only applied to the first references in the line. Each standard reference needs the prefix. (Optional)
The individual standard references (within the Standard) are listed on the same row (with ',' delimiter), and each references needs to be on its own row
The standards are staggered (i.e. B-1.2.3 and C2 can be in the same row, but they are not)
I've tried applying the solution here: Split cell into multiple rows in pandas dataframe, but my columns have different lengths and so I get the error stating that fact.
I also tried to adapt this solution, also without success: Pandas dataframe: how do I split one row into multiple rows by multi-value column?
I'm not convinced that either of these solutions is even applicable. I may be barking up the wrong tree.
I'm pretty lost at this point.