Take the example:
pd.DataFrame(
index=["r1", "r2"],
columns=["c1","c2","c3", "group_by"],
data=[
["v1",[{"x_title":"xt1","x_label":"xl1","y_title":"yt1","y_label":"yl1"},
{"x_title":"xt1","x_label":"xl1","y_title":"yt2","y_label":"yl2"},
{"x_title":"xt2","x_label":"xl2","y_title":"yt3","y_label":"yl3"}],
"v3","x"],
["v1",[{"x_title":"xt1","x_label":"xl1","y_title":"yt1","y_label":"yl1"},
{"x_title":"xt2","x_label":"xl2","y_title":"yt2","y_label":"yl2"},
{"x_title":"xt3","x_label":"xl3","y_title":"yt3","y_label":"yl3"}],
"v3","y"],
]
)
Which produces a dataframe like:
c1 c2 c3 group_by
r1 v1 [{"x_title": "xt1", "x_label": "xl1", "y_title... v3 x
r2 v1 [{"x_title": "xt1", "x_label": "xl1", "y_title... v3 y
The main point of conversation will be c2, which is a list of dictionaries. I would like to "pull out" each unique value for the k,v pair where k ~= f"{parent.group_by}_title". So if group_by
on the row == x
, each unique x_title
will have a "child" row generated that keeps all the remaining values.
The end dataframe will hopefully look similar to:
pd.DataFrame(
index=["r1","r2","r1","r1","r2","r2","r2"],
columns=["group","orig_name","c2","c3", "type"],
data=[
[None,"v1",None,"v3","parent"],
[None,"v1",None,"v3","parent"],
["xt1","v1",[{"y_title":"yt1","y_label":"yl1"},{"y_title":"yt2","y_label":"yl2"}],"v3","child"],
["xt2","v1",[{"y_title":"yt3","y_label":"yl3"}],"v3","child"],
["yt1","v1",[{"x_title":"xt1","x_label":"xl1"}],"v3","child"],
["yt1","v1",[{"x_title":"xt2","x_label":"xl2"}],"v3","child"],
["yt1","v1",[{"x_title":"xt3","x_label":"xl3"}],"v3","child"],
]
)
or like:
group orig_name c2 c3 type
r1 None v1 None v3 r1 parent
r2 None v1 None v3 r2 parent
r1 xt1 v1 [{"y_title": "yt1", "y_label": "yl1"}, {"y_tit... v3 r1 child
r1 xt2 v1 [{"y_title": "yt3", "y_label": "yl3"}] v3 r1 child
r2 yt1 v1 [{"x_title": "xt1", "x_label": "xl1"}] v3 r2 child
r2 yt1 v1 [{"x_title": "xt2", "x_label": "xl2"}] v3 r2 child
r2 yt1 v1 [{"x_title": "xt3", "x_label": "xl3"}] v3 r2 child
I'm able to do this "manually" but iterating through each row and going from there. But I would like to have a more "pandas" answer if possible. I've dabbled in ~apply(lambda ~
, explode
, using a def children(row)
function put into a ~apply(lambda ~
, and a few other methods. They are didn't quite align to what I'm interested in, and none are quite close enough to provide details.
I'm sure (as I always do) I'm somehow overthinking/looking a simple solution, just not aware of a function that will help, or just no using one of my tries correctly.
Hoping someone could explain a better way of doing this? Thank you!