I have a dataframe with two columns 'text' and 'lang' and I need to extract the groups (unique) of 'text' values that have the same number N of languages. For example:
For the following example dataframe:
text lang
--------------
text_a en
text_b es
text_a es
text_a it
text_c de
text_c pt
text_d no
...
I can extract the list of languages per unique text:
df.groupby('text').lang.apply(list)
and that gives me a result like this one:
text_a -> [es, en, it, fr]
text_b -> [es, it, de]
text_c -> [es, nl, it]
text_d -> [fr, no, de, pt]
Now, from this result, how can i filter all the texts that appear in the same N languages? For example, for spanish and french the desired result would be all the rows from the initial dataframe where all seleted text values also have 'es' and 'fr' on the lang column.
text lang
--------------
text_a fr
text_b es
text_a es
text_b es
text_b fr
text_c fr
text_d es
...
The output contains all texts that have a row with 'es' and a row with 'fr' and only those two appear in the output. The isin() function will not work here.
Thanks in advance.