0

How can I concat a Series of shape (4,) to a df of shape (1,4) and obtain a df of shape (2,4) without converting the Series to a df first? I am trying to insert a Series as the top row of a df.

For example:

import pandas as pd

mydict1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
mydict2 = [{'a': 5, 'b': 6, 'c': 7, 'd': 8}]

# 1x4 dataframes
df1 = pd.DataFrame(mydict1)
df2 = pd.DataFrame(mydict2)

# series1.shape: (4,)
series1 = df1.iloc[0]
# df3.shape: (1,4)
df3 = df1.iloc[[0]]

# 5x5 df. With a new row and column representing the indexes of each. If anything, I'd expect a 4x4 df here, not a 5x5.
dfDfSeriesAxis0 = pd.concat([df2, df1.iloc[0]], axis=0)
# 5x5 df.  I would think this is different from the above method with axis=0, but it appears to be identical
dfDfSeriesAxis1 = pd.concat([df2, df1.iloc[0]], axis=1)
# 5x5 df
dfSeriesDfAxis0 = pd.concat([df1.iloc[0], df2])
# 5x5 df
dfSeriesDfAxis1 = pd.concat([df1.iloc[0], df2], axis=1)
# This achieves the result I want (2x4 df) but must convert to a df before concat.
dfDf1Df2Axis0 = pd.concat([df1.iloc[[0]], df2])
# Concats to a 2x4 df, but in the wrong order
dfDf2Df1Axis0 = pd.concat([df2,df1.iloc[[0]]])
# Concats along incorrect axis and I end up with a 1x8 df
dfDf1Df2Axis1 = pd.concat([df1.iloc[[0]], df2], axis=1)
# Appends along correct axis and I end up with a 2x4 df. Why does appending work as expected but concat does not?
dfAppendSeries = df1.append(df2.iloc[-1])
# Appends along correct axis and I end up with a 2x4 df
dfAppendDf = df1.append(df2)

It appears iloc[0] returns a Series while iloc[[0]] returns a dataframe. Furthermore iloc[0:1] appears to return the same dataframe that iloc[[0]] returns.

My main source of confusion is why dfAppendSeries = df1.append(df2.iloc[-1]) results in the expected 2x4 df, whereas dfDfSeriesAxis0 = pd.concat([df2, df1.iloc[0]], axis=0) results in a 5x5 df. I really can't image how the resulting df from dfDfSeriesAxis0 = pd.concat([df2, df1.iloc[0]], axis=0) would be useful under any circumstance.

Is there a way to make the returned object from df1.iloc[0] compatible to concat with df2 without making it a dataframe itself? In that I mean making it the appropriate shape to concat with a (1,4) df to result in a df of shape (2,4)? I tried transposing series1, but this appears to have no affect on the shape.

Although not explicitly stated in this context, according to the docs I would expect to be able to do this:

Returns
object, type of objs
When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned.

topher217
  • 1,188
  • 12
  • 35
  • Neither of those concat statements are currently callable. Perhaps you mean to have those values in a list? – Henry Ecker Jul 21 '21 at 14:42
  • If that is the case, the first you're concating a series which has a single column 0 and indexes a,b,c,d to a dataframe with indexes 0,1,2 and columns a,b,c,d. The resulting shape is reflective of this. – Henry Ecker Jul 21 '21 at 14:43
  • The second you're concating a DataFrame which has 4 columns a,b,c,d and a single row (index 0). – Henry Ecker Jul 21 '21 at 14:44
  • The 1st parameter of `concat()` method is `objs` `a sequence or mapping of Series or DataFrame objects` you have to pass either `list` or `tuple` of dataframes/Series – Anurag Dabas Jul 21 '21 at 14:46
  • Thanks for pointing that out. I made an edit with the missing square brackets. – topher217 Jul 21 '21 at 14:47
  • @HenryEcker I added an example of using append in a similar syntax (and it working). Sure append and concat are separate methods, but I assumed they would represent similar behaviors, so find this disconnect fairly confusing. I also tried to transpose the Series first before concatting it, but resulted in the same mess. – topher217 Jul 21 '21 at 14:50
  • 1
    A series is neither vertical nor horizontal. It is a list of paired observations which means that it is one dimensional. – j__carlson Jul 21 '21 at 15:24
  • @j__carlson my terminology was lax, but what I mean is the difference between an object of shape (4,) and (1,4). I'd like to know how to concating a Series of shape (4,) to a dataframe of shape (1,4) to result in a dataframe of shape (2,4). They only way I can find to do this is by converting the series to a dataframe first. According to the concat docs, I would expect to be able to do this with a Series without converting to a dataframe. I'll update my question to reflect these clarifications. – topher217 Jul 23 '21 at 03:31
  • 1
    what about something like `pd.concat([df3.T, series1], axis=1).T` – tdy Jul 23 '21 at 23:05
  • 1
    @tdy yes! This is what I was looking for. Well this `pd.concat([series1, df2.T], axis=1).T` to be more precise, but the double transpose works. I guess it makes sense as transposing a 1-d Series is kind of a malformed intention I guess, so you have to transpose the dataframe instead. This seems hacky though, when compared to how `append` appears to handle this without having to transpose then reverse transpose. – topher217 Jul 24 '21 at 08:04

2 Answers2

1

The reason is that when .iloc is given a single integer it returns a series as it does with df.iloc[0], but when it is given a list it returns a data frame. When you add the extra brackets it treats the single integer zero as a list of integers containing only zero. The pd.concat() function returns a series if it is passed two series, but if it is passed a data frame at all it will always return a data frame. In the case of df1.iloc[0] this puts pandas in the position of making a data frame from a series. It uses the letter column as the row label and enters the series data vertically with a default column name of 0. When the data frame is added to the converted series it keep its row names, which do not match those of the converted series, and it is filled post column zero because df2 has no column zero.

EDIT:

This code should get you a (2,4) Data Frame with minimal effort:

import pandas as pd

mydict1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
mydict2 = [{'a': 5, 'b': 6, 'c': 7, 'd': 8}]
df1 = pd.DataFrame(mydict1)
df2 = pd.DataFrame(mydict2)
DF1=df1.melt().set_index('variable')
DF2=df2.melt().set_index('variable')
DF1.insert(1,'col_name',DF2['value'],True)

DF1  #4x2 data frame#

answer= DF1.T  #2x4 data frame#
answer

I hope this helps.

j__carlson
  • 1,346
  • 3
  • 12
  • 20
  • This is a nice explanation of the `iloc` syntax questions I had. I edited my question to be more explicit regarding the end issue (how to get `concat` to take a series of shape (4,) and a df of shape (1,4) and get a (2,4) df in return with the series being the top row. Are you able to edit your answer to include a solution for this? – topher217 Jul 23 '21 at 04:04
1

As discussed in the comments, you can concat using a double transpose:

pd.concat([series1, df2.T], axis=1).T.reset_index(drop=True)

#    a  b  c  d
# 0  1  2  3  4
# 1  5  6  7  8

However note that it's much faster to prepend the Series as a list insertion:

%%timeit
data = df2.values.tolist()
data.insert(0, series1.tolist())
pd.DataFrame(data, columns=df2.columns)

# 367 µs ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

As opposed to dataframe expansion, especially if you're planning to prepend often:

%%timeit
pd.concat([series1, df2.T], axis=1).T.reset_index(drop=True)

# 1.36 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
tdy
  • 36,675
  • 19
  • 86
  • 83