-1

I have a data frame, that looks like this:

print(df)
 Text     
 0|This is a text 
 1|This is also text

What I wish: I would like to do a for loop over the Text column for the data frame, and create a new column with the derived information to be like this:

   Text             | Derived_text 
 0|This is a text   | Something
 1|This is also text| Something

Code: I have written the following code (Im using Spacy btw):

for i in df['Text'].tolist():
    doc = nlp(i)
    resolved = [(doc._.coref_resolved) for docs in doc.ents]
    df = df.append(pd.Series(resolved), ignore_index=True)

Problem: The problem is that the appended series gets misplaced/mismatched, so it looks like this:

  Text              | Derived_text 
 0|This is a text   | NaN
 1|This is also text| NaN
 2|NaN              | Something
 3|NaN              | Something

I have also tried to just save it into a list, but the list does not include NaN values, which can occur doing the derived for loop. I need the NaN values to be kept, so I can match the original text with the derived text using the index position.

AMC
  • 2,642
  • 7
  • 13
  • 35
  • Why are you using `df.append` inside the for loop. Instead you could just copy the series object you create in the for loop and assign it to the new column. – Anurag Reddy Oct 20 '20 at 22:57
  • So, you propose that I instead save the results to a new Series, and then outside the for loop combine the series with the existing df? – TooManyRightHands Oct 20 '20 at 22:59
  • Yes, just save the derived values in a list or as a pandas series and assign the new column with the values `df['derived_text'] = your_series` – Anurag Reddy Oct 20 '20 at 23:01
  • Please provide a [mcve]. – AMC Oct 20 '20 at 23:09
  • I believe I tried that, but then the NaN values disappears. So I will have one entry for the derived_text but 2 entries for the original text. How should I recognize which entry it belongs to? :) – TooManyRightHands Oct 20 '20 at 23:11
  • @AMC I cannot see the usefulness of your comment. I clearly show the input, the issue with the code and the expected output. Everybody, who shows an interest in actually solving the issue would understand the problem. Please refrain from copy-pasting a oneliner, if you cannot provide constructive feedback to users. – TooManyRightHands Oct 21 '20 at 08:43
  • _I cannot see the usefulness of your comment. I clearly show the input, the issue with the code and the expected output._ The goal of a MCVE is to make reproducing the issue as easy as possible. Currently, I have to write code to parse an ambiguous representation of your data. Never mind the missing imports. You might find [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391) useful. – AMC Oct 22 '20 at 02:08

1 Answers1

0

It appears that you want to add a column, which can be done using pandas concat method using the axis argument like pd.concat([df, new_columns], axis = 1).

However I think you shouldn't use for loops while using pandas. What probably should do is use it's pandas's apply function, which would look something like:

# define you DataFrame
df = pd.DataFrame(data = [range(6), range(1, 7)], columns = ['a', 'b'])

# create the new column from one of them
df['a_squared'] = df['a'].apply(lambda x: x ** 2)

Maybe you should also look into lambda expressions.

Also, look into this stackoverflow question.

Hope this helped! Happy coding!

Felipe Whitaker
  • 470
  • 3
  • 9