1

So I have a table that looks like the following:

0 1 2 3 4 5
rs10000911 4 144136193 100.000000 - AC
rs10000988 4 76010255 99.173554 - AG
rs10002181 4 142250415 100.000000 + AG
rs10005140 4 22365603 99.173554 + AG
rs10005242 4 5949558 100.000000 + AG

Now I want to create an additional row or a series that would contain a combination of columns 1 and 2 that looks like this: 4:144136193, 4:76010255, 4:142250415, etc. Now I am using an iterrows solution:

new_column = pd.Series([])
for index, line in table.iterrows():
    new_column = new_column.append(pd.Series(str(line[1])+':'+str(line[2])))

Because my table contains 800 000 rows iterrows is very slow. Is there any way to speed this up?

YKY
  • 2,493
  • 8
  • 21
  • 30
  • 1
    Don't iterate over pandas DataFrame. See answers in this post https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#55557758. If your task cannot be performed with pandas fanction, iterate over numpy arrays – dberezin Mar 16 '21 at 10:01

3 Answers3

4

You can do this:

df['new_column'] = df[1].astype(str) + ":" + df[2].astype(str)

Bart
  • 190
  • 2
  • 8
3
df["new"] = df[["1", "2"]].apply(lambda x: ":".join(map(str, x)), axis=1)
print(df)

Or:

df["new"] = df[["1", "2"]].astype(str).apply(":".join, axis=1)
print(df)

Prints:

            0  1          2           3  4   5          new
0  rs10000911  4  144136193  100.000000  -  AC  4:144136193
1  rs10000988  4   76010255   99.173554  -  AG   4:76010255
2  rs10002181  4  142250415  100.000000  +  AG  4:142250415
3  rs10005140  4   22365603   99.173554  +  AG   4:22365603
4  rs10005242  4    5949558  100.000000  +  AG    4:5949558
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

You can do:

new_col = table.apply(lambda line: pd.Series(str(line[1])+':'+str(line[2])),axis=1)

This will give you a new dataframe new_col:

             0
0  4:144136193
1   4:76010255
2  4:142250415
3   4:22365603
4    4:5949558

(If want only a series, not a dataframe, new_col[0] will give you one.)

zabop
  • 6,750
  • 3
  • 39
  • 84