-3

I am using set in a dataframe to remove duplicate words in a list, but the original words changed in the result.

these are the words shown in dataframe:

[Which, one, dissolve, in, water, quickly, sugar, ,, salt, ,, methane, and, carbon, di, oxide, ?]

note: words like 'sugar,' and 'salt,' are with comma

these are the result shown in dataframe after using set: {oxide, sugar, Which, di, water, in, ,, salt, carbon, dissolve, one, ?, methane, quickly, and}

data['sent1']=data['sent1'].apply(lambda x : set(x))

I want the words to keep the same order after using set. I really get puzzled why set will change the original words(form'sugar,'to'sugar')

1 Answers1

0

If each row in your data frame looks like this:

data.loc[0, "sent1"] = ["Which", "one", "dissolve", "in", "water", "quickly", "sugar", ",", "salt", ",", "methane", "and", "carbon", "di", "oxide", "?"]

Then you could append the comma before applying the set operation, like:

data['sent1'] = data['sent1'].apply(lambda x: set([i + "," for i in x]))

On the other hand, f each row in `data['sent1']``is one long string of words:

data.loc[0, "sent1"] = ["Which", "one", "dissolve", "in", "water", "quickly", "sugar", ",", "salt", ",", "methane", "and", "carbon", "di", "oxide", "?"]

then try:

data['sent1'] = data['sent1'].apply(lambda x: set(x.split(" ")))
Ted
  • 1,189
  • 8
  • 15
  • @jonathanschum Glad it helped and welcome to SO! Feel free to accept the answer and upvote. – Ted Aug 26 '19 at 08:06