The way I see, you have two options:
Option 1: store the sub-reviews as lists
In this option, for every "review"
, you create two lists to store the values from sub_str_1_i
, and another for sub_str_i_n
. Then you add those lists as new columns in their respective rows. Here's an example:
import pandas as pd
# == Create some dummy data ====================================================
correct_X_test = pd.DataFrame({"review": ["This is a review",
"This is another review",
"This is a third review"]})
# == Solution 1 ================================================================
correct_X_test['1_i'] = None
correct_X_test['i_n'] = None
for j, row in correct_X_test.iterrows():
input_list = row["review"].split()
sub_list_1_i, sub_list_i_n = [], []
for i in range(len(input_list)):
# Construction de la séquence de x1 à xi
sub_str_1_i = " ".join(input_list[:i+1])
# Construction de la séquence de xi à xn
sub_str_i_n = " ".join(input_list[i:])
sub_list_1_i.append(sub_str_1_i)
sub_list_i_n.append(sub_str_i_n)
correct_X_test.loc[j, '1_i'] = sub_list_1_i
correct_X_test.loc[j, 'i_n'] = sub_list_i_n
print(correct_X_test)
# Prints:
#
# review 1_i \
# 0 This is a review [This, This is, This is a, This is a review]
# 1 This is another review [This, This is, This is another, This is anoth...
# 2 This is a third review [This, This is, This is a, This is a third, Th...
# i_n
# 0 [This is a review, is a review, a review, review]
# 1 [This is another review, is another review, an...
# 2 [This is a third review, is a third review, a ...
Option 2: create new rows for every combination of sub_str_1_i
and sub_str_i_n
In this option, each combination of sub_str_1_i
and sub_str_i_n
are stored as new rows in the dataframe. You can use the method pd.DataFrame.explode
to convert the output from Option 1 into new rows:
correct_X_test.explode(['i_n', '1_i'])
# Returns:
#
# review 1_i i_n
# 0 This is a review This This is a review
# 0 This is a review This is is a review
# 0 This is a review This is a a review
# 0 This is a review This is a review review
# 1 This is another review This This is another review
# 1 This is another review This is is another review
# 1 This is another review This is another another review
# 1 This is another review This is another review review
# 2 This is a third review This This is a third review
# 2 This is a third review This is is a third review
# 2 This is a third review This is a a third review
# 2 This is a third review This is a third third review
# 2 This is a third review This is a third review review