0

If I have a df:

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])

and wish to take the second column "b" and append to the end of a "new" df with the columns "a" and "b" and a name column containing the name of the "b" column and then the third column "c" and append to the end of the new df together with "a" and the name "c" appened to the name column. It is timeseries data with a datetime in "a" and a variable in b and c and there is sometimes 20 variables and sometimes 1 or 2.

How do I do that in a pretty and efficient way. right now im doing it like this but have to do it a 100 times for slightly different df but with the same idea.

col_nam_list = list(df.columns.values)
df_1 = pd.DataFrame()
df_1["a"] = df["a"]
df_1["name"] = col_nam_list[1]
df_1["value"] = df["b"]

df_2 = pd.DataFrame()
df_2["a"] = df["a"]
df_2["name"] = col_nam_list[2]
df_2["value"] = df["c"]

result = pd.concat([df_1, df_2])

This should be the output results

Now this is not fun to write and looks so ugly and unnecessary long. How do I improve my method?

BR

  • From your code, it seems you want to have `a` twice. And are you using df_1 and df_2 for anything else later or just as a temporals dataframe you can later concatenate to form the result? – Ignacio Alorre Jan 14 '21 at 12:51
  • yes I want a 2 column df with "a" twice and then b and c after each other. I only have df_1 and df_2 to make the results df in the end. – TheFrederik Jan 14 '21 at 12:53
  • or actually ill have a .3 column with the name of "b" and "c" repeated x number of times. ill try to adjust my question to make it more usable for others. – TheFrederik Jan 14 '21 at 12:54
  • 1
    Probably this is what you are looking for: https://stackoverflow.com/questions/13728208/several-time-series-to-dataframe –  Jan 14 '21 at 12:55
  • Does this answer your question? [Several time series to DataFrame](https://stackoverflow.com/questions/13728208/several-time-series-to-dataframe) – Chris Jan 14 '21 at 12:56
  • I adjusted the question. It gave the wrong results. and added a screen dump of the desired results – TheFrederik Jan 14 '21 at 13:08
  • I dont think its the same problem. I would like to not use that much manual time on splitting the df up into a lot of dfs before I add them together in the end. – TheFrederik Jan 14 '21 at 13:12

1 Answers1

2

IIUC, you can use pd.DataFrame.melt with parameter id_vars equal to 'a',

df.melt('a')

Output:

   a variable  value
0  1        b      2
1  4        b      5
2  7        b      8
3  1        c      3
4  4        c      6
5  7        c      9
Scott Boston
  • 147,308
  • 15
  • 139
  • 187