2

I'm trying to add a new column to a dataframe, and fill up that column with multiple other columns in the dataframe concatenated together. How can I do this with the fact that this new column will have a different length than the rest of the columns in the dataframe?

For example:

df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])

I would want to create a column C in the dataframe that reads 1,1,4,2,3,6 (except vertically)

print (df)
   A  B  C
0  1  2  1 
1  1  3  1
2  4  6  4
3        2
4        3
5        6
Matthew
  • 67
  • 1
  • 7

3 Answers3

7

Slightly modified Anton's solution (will work for any number of columns):

In [99]: df = pd.DataFrame(np.random.randint(0,10,(3,4)), columns=list('ABCD'))

In [100]: df
Out[100]:
   A  B  C  D
0  9  6  9  6
1  1  2  0  8
2  5  0  4  8

In [105]: pd.concat([df, df.T.stack().reset_index(name='new')['new']], axis=1)
Out[105]:
      A    B    C    D  new
0   9.0  6.0  9.0  6.0    9
1   1.0  2.0  0.0  8.0    1
2   5.0  0.0  4.0  8.0    5
3   NaN  NaN  NaN  NaN    6
4   NaN  NaN  NaN  NaN    2
5   NaN  NaN  NaN  NaN    0
6   NaN  NaN  NaN  NaN    9
7   NaN  NaN  NaN  NaN    0
8   NaN  NaN  NaN  NaN    4
9   NaN  NaN  NaN  NaN    6
10  NaN  NaN  NaN  NaN    8
11  NaN  NaN  NaN  NaN    8
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thank you for the response, how would I be able to modify this if I wanted to concatenate 3 columns at once instead of two? Would this work if there were more columns in the data set than the ones I want to concatenate? – Matthew Aug 03 '17 at 20:10
  • @MaxU nice modification indeed +1 – Anton vBR Aug 03 '17 at 20:12
  • Thanks, that helps a lot, what if in your example, I just wanted to concatenate columns A, B & C? – Matthew Aug 03 '17 at 20:15
  • @Matthew, `pd.concat([df[['A','B','C']], df.T.stack().reset_index(name='new')['new']], axis=1)` – MaxU - stand with Ukraine Aug 03 '17 at 20:16
  • @MaxU, for some reason I keep getting this error name 'new' is not defined, and when I try setting it equal to df['new'] I get this : Wrong number of items passed 28, placement implies 1 – Matthew Aug 03 '17 at 20:22
  • @Matthew, try to split up this command into pieces and identify which piece produces this error... – MaxU - stand with Ukraine Aug 03 '17 at 20:27
  • sorry, how would I do that exactly? when I run it on my Ipython console it just tells me : name 'new' is not defined – Matthew Aug 03 '17 at 20:29
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/151002/discussion-between-matthew-and-maxu). – Matthew Aug 03 '17 at 20:31
2

How about this:

df1 = pd.DataFrame.from_dict({'A':[1,1,4],'B':[2,3,6]})

concatvalues = np.concatenate([df1.A.values,df1.B.values])

df2 = pd.concat([df1,pd.DataFrame(concatvalues)], ignore_index=True, axis=1)
df2.columns = np.append(df1.columns.values, "concat")

print(df2)

prints

    A   B   concat
0   1.0 2.0 1
1   1.0 3.0 1
2   4.0 6.0 4
3   NaN NaN 2
4   NaN NaN 3
5   NaN NaN 6
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Thank you for the response, is there a way I could do this for a larger dataset where it would be much more difficult to name all the numbers? Also can I concatenate 3 columns at once with this? Or would that require something different? I realize this confusion is due to my bad example, but I hastily inputed it once I realized I needed one – Matthew Aug 03 '17 at 20:09
1

If you have a long list of columns that you need to stack vertically - you can use the following syntax, rather than naming them all inside pd.concat():

pd.concat([df.loc[:, col] for col in df.columns], axis = 0, ignore_index=True)