Convert a pandas dataframe to tab separated list in Python

Question

I have a dataframe like below:

import pandas as pd
data = {'Words':['actually','he','came','from','home','and','played'], 
        'Col2':['2','0','0','0','1','0','3']}
data = pd.DataFrame(data)

The dataframe looks like this:

I write this dataframe into the drive using below command:

np.savetxt('/folder/file.txt', data.values,fmt='%s', delimiter='\t')

And the next script reads it with below line of code:

data = load_file('/folder/file.txt')

Below is load_file function to read a text file.

def load_file(filename):
    with open(filename, 'r', encoding='utf-8') as f:
        data = f.readlines()
    return data

The data will be a tab separated list.

print(data)

gives me the following output:

['actually\t2\n', 'he\t0\n', 'came\t0\n', 'from\t0\n', 'home\t1\n', 'and\t0\n', 'played\t3\n']

I dont want to write the file to drive and then read it for processing. Instead I want to convert the dataframe to a tab separated list and process directly. How can I achieve this?
I checked for existing answers, but most just convert list to dataframe and not other way around. Thanks in advance.

`data.to_csv(header=None, index=False, sep='\t').split('\n')` ?? — Pygirl, Jan 29 '21 at 06:39
Python lists don't have separators, though comma is used in the display of a list. You can make a string with tab and newline separators. What kind of processing are you doing? — hpaulj, Jan 29 '21 at 07:10

Pygirl · Accepted Answer · 2021-01-29T10:48:18.963

2

Try using .to_csv()

df_list = data.to_csv(header=None, index=False, sep='\t').split('\n')

df_list:

['actually\t2',
 'he\t0',
 'came\t0',
 'from\t0',
 'home\t1',
 'and\t0',
 'played\t3'
]

v = df.to_csv(header=None, index=False, sep='\t').rstrip().replace('\n', '\n\\n').split('\\n')

df_list:

['actually\t2\n',
 'he\t0\n',
 'came\t0\n',
 'from\t0\n',
 'home\t1\n',
 'and\t0\n',
 'played\t3\n'
]

edited Jan 29 '21 at 10:48

answered Jan 29 '21 at 06:41

Pygirl

12,969
5
30
43

1

thanks @pygirl... exactly what I needed :) – Varun kadekar Jan 29 '21 at 07:14
I get the last element as '' , and had to remove it. Not sure you faced the same , but atleast its not visible in the print you have given above – Varun kadekar Jan 29 '21 at 10:36
actually I removed it manually from the output. Yes I faced the same thing. Because in the last row I have added \\n. This should be avoided for the last value. You can create a loop to filter out the empty string – Pygirl Jan 29 '21 at 10:37
Ah... now that I see, Comma at last was indeed my mistake. But despite removing, I got empty element in the end. Something like this. ['actually\t2\r', 'he\t0\r', 'came\t0\r', 'from\t0\r', 'home\t1\r', 'and\t0\r', 'played\t3\r', ''] – Varun kadekar Jan 29 '21 at 10:42
Sorry. I was in hurry. Correction it's because of the last row containing `\n` One way of solving this is to use regex or you can filter it out using a loop I willupdate my answer :) – Pygirl Jan 29 '21 at 10:43
@Varunkadekar: I have updated my answer :) the string has `\n` in the end which can be removed easily by using `rstrip()` – Pygirl Jan 29 '21 at 10:48
thanks... I had excluded final element in next step, but this does the trick in one line. kudos :) – Varun kadekar Jan 29 '21 at 14:55

score 1 · Answer 2 · answered Jan 29 '21 at 06:42

1

I think this achieves the same result without writing to the drive:

df_list = list(data.apply(lambda row: row['Words'] + '\t' + row['Col2'] + '\n', axis=1))

answered Jan 29 '21 at 06:42

Iker Olarra

65
6

@Varunkadekar: Use of apply should be avoided :) just for info. Because they make the performance slow. https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code – Pygirl Jan 29 '21 at 10:50
thanks @Pygirl, really appreciate pointers you re throwing here. I was under the impression that apply was better than for loop... atleast in R – Varun kadekar Jan 29 '21 at 14:43

score 1 · Answer 3 · answered Jan 29 '21 at 06:55

1

Try:

data.apply("\t".join, axis=1).tolist()

answered Jan 29 '21 at 06:55

Lambda

1,392
1
9
11

thanks Lambda... this works too. Just that I needed '\n' in the end for my next set of processing. – Varun kadekar Jan 29 '21 at 10:43
You can use `data.apply(lambda x: "\t".join(x)+"\n", axis=1).tolist()` – Lambda Jan 30 '21 at 09:10

Convert a pandas dataframe to tab separated list in Python

3 Answers3