0

I'm looking for the proper way to "iterate" over the rows or - let's say - do the same thing without iteration, as I know that iteration is not the recommended way of handling the data in a dataframe for computations, as explained for instance in this question and in the pandas documentation. To be more precise, let me explain my issue.

I have a dataframe containing start values, end values, and number of steps, e.g.

df_test = pd.DataFrame({"start": [-2.0, -1.0, -5.0 ],
                        "end": [3.0, 1.0, -1.0],
                        "n": [6, 3, 9]
                       })

From this dataframe I would like to create a new column for an existing dataframe which contains concatenated linspaces described by the above start and end points and the number of points. The existing dataframe has the matching shape. My current approach is using list comprehension, then concatenate the arrays to a single array, and then add the column. So:

linspacePts = np.concatenate([np.linspace(s, e, n) for s,e,n in zip(df_test["start"], df_test["end"], df_test["n"])])
df_other["lin. Pts"] = linspacePts 

But my first idea was to use df.apply somehow. But I can't figure out how to tell np.linspace which column corresponds to which argument of the function. At least, I found a workaround, but I was hoping for a better solution, regarding the required detour via a list and numpy array.

Thanks for your help!

Chris
  • 15,819
  • 3
  • 24
  • 37
kluonk
  • 87
  • 2
  • 7

1 Answers1

1

Use the apply method of the dataframe and index the columns you want with [] syntax.

import numpy as npd
import pandas as pd

df_test = pd.DataFrame({"start": [-2.0, -1.0, -5.0 ],
                        "end": [3.0, 1.0, -1.0],
                        "n": [6, 3, 9]
                       })
df_test.apply(lambda row: np.linspace(row["start"], row["end"], row["n"].astype(int)), axis=1)

And if you are unfamiliar with lambda functions, the following is identical but more verbose.

def create_linspace(row):
    # row is a pd.Series
    return np.linspace(row["start"], row["end"], row["n"].astype(int))

df_test.apply(create_linspace, axis=1)

Please note that you need to cast the value of n to an integer type, because np.linspace will raise an exception otherwise.

Then you can concatenate the result with np.concatenate. I'm not sure how you were planning to add this array to the dataframe. The dataframe is typically rectangular, meaning you cannot have rows of unequal lengths. Because your n values are different, you will have different length arrays.

jkr
  • 17,119
  • 2
  • 42
  • 68
  • Hey jakub, thanks! I was also looking for a proper about a lambda function. But didn't realize that I can access the entries of a row that way. That's exactly what I was looking for! Regarding how to add this to the dataframe, it's a different one with `np.sum(df_test["n"])` rows. So the sizes actually fit. Here I denoted this second dataframe as `df_other` in the given example. – kluonk May 11 '20 at 10:27