1

Note: Cartesian product, might not be the right language, since we are working with data, not sets. It is more like "free product" or "words".

There is more than one way to turn a dataframe into a list of lists.

Here is one way

In that case, the list of lists represents actually a list of columns, where the list index is the row index.

What I want to do, is take a data frame, select specific columns by name, then produce a new list where the inner lists are cartesian products of the elements from the selected columns. A simplified example is given here:

import pandas as pd
df = pd.DataFrame([[1,2,3],[3,4,5]])

magicMap(df)

df = [[1,3],[2,4],[3,5]]

With column names:

df # full of columns with names
magicMap(df, listOfCollumnNames)
df = [[c1r1,c2r1...],[c1r2, c2r2....], [c1r3, c2r3....]...]

Note: "cirj" is column i row j.

Is there a simple way to do this?

user442920
  • 857
  • 4
  • 21
  • 49
  • 1
    Is this: `[[1,3],[2,4],[3,5]]` the cartesian product? Could you please add a more meaningful example? Although this seems that can be done with itertools – Dani Mesejo Dec 19 '20 at 21:07
  • I am not sure if cartesian product is the right language. We are working with data, not sets, so it would be more like free product or words. – user442920 Dec 19 '20 at 21:16
  • 2
    It seems that you actually want the transpose, and you already have an answer for that :) – Dani Mesejo Dec 19 '20 at 21:18

2 Answers2

2

The code

import pandas as pd
df = pd.DataFrame([[1,2,3],[3,4,5]])
df2= df.transpose()

goes from, df

    0   1   2
0   1   2   3
1   3   4   5

to that, df2

    0   1
0   1   3
1   2   4
2   3   5

looks like what you need

df2.values.tolist()

[[1, 3], [2, 4], [3, 5]]

and to get the column order in the way you want use df3 = df2.reindex(columns=column_names) where column_names is the order you want,

Paul Brennan
  • 2,638
  • 4
  • 19
  • 26
  • 1
    neat answer. Thanks. Not sure if it would work perfectly, so I will try. – user442920 Dec 19 '20 at 21:17
  • 1
    For `transpose()`, you can use `.T` instead. You can also unpack values to an array like this: `[*df2.values]`. This just makes it a bit more syntactical, so the entire solution could be `[*df.T.values]` – David Erickson Dec 19 '20 at 21:36
0

You can also send the dataframe to a numpy array with:

df.T.to_numpy()

array([[1, 3],
       [2, 4],
       [3, 5]], dtype=int64)

If it must be a list, then use the other answer provided or use:

df.T.to_numpy().tolist()
David Erickson
  • 16,433
  • 2
  • 19
  • 35