1

I have a file with single column data. Few of them needs to be converted to column as header. After a few dask computations, I have reduced my dataframe as below:

In [9]: df.compute()
Out[9]:
                                    *
0                    140 Global Intel
1                         1 Frequency
2                          2 Currency
3               3 Currency Conversion
4                            4 Market
5                      5 Segmentation
6                            6 Sector

Is it possible to transpose the rows into columns and create a new dataframe using dask itself? Any help is appreciated.

EDIT: Here how's my final dataframe should look like after tranpose.

In [22]: df_final
Out[22]:
Empty DataFrame
Columns: [140 Global Intel, 1 Frequency, 2 Currency, 3 Currency Conversion, 4 Market, 5 Segmentation, 6 Sector]
Index: []
S Verma
  • 103
  • 2
  • 9
  • But then do you want to use the single column dataset with that header? Or use it for another header? In the latter case, you could get the values and create a new dataframe with those values as header (you can specify that option when creating or updating a dataframe, at least in pandas, which should have the same API as dask) – lsabi Jan 28 '20 at 08:33
  • @Isabi I have included final output for reference – S Verma Jan 28 '20 at 08:40
  • According to [https://github.com/dask/dask/issues/1651](https://github.com/dask/dask/issues/1651), df.compute() should return a pandas dataframe. So, you can use the pandas api. df_pandas = df.compute() cols = df_pandas.values.tolist() . Then create a new dask dataframe as shown in [https://stackoverflow.com/questions/39721800/convert-pandas-dataframe-to-dask-dataframe](https://stackoverflow.com/questions/39721800/convert-pandas-dataframe-to-dask-dataframe) – lsabi Jan 28 '20 at 09:01

1 Answers1

1

You can create an empty DataFrame from a column (in your case : column='*') of your DataFrame df using :

import pandas as pd
df_empty = pd.DataFrame(columns=df.compute()[[column]].T)

If you print df_empty :

Empty DataFrame
Columns: [(140 Global Intel, 1 Frequency, 2 Currency, 3 Currency Conversion, 4 Market, 5 Segmentation, 6 Sector)]
Index: [] 

If you want to switch back to Dask, use dd.from_pandas

DavidK
  • 2,495
  • 3
  • 23
  • 38