0

I have a big dataframe (df) X with n columns (~30000), ~2000 rows and column names like these:

A,B,C,D,F,G,H,v1,453,73v,4-5,ss,9-dd,...,n

The elements of X are a mix of integers, floats and strings.

Using python or unix/bash, I want to split X into n-7 dfs. Each resulting df will keep the first 7 columns from X + the next single column from X. Thus, the first 3 dfs will have the following columns:

A,B,C,D,F,G,H,v1

A,B,C,D,F,G,H,453

A,B,C,D,F,G,H,73v

and so on...

I want each of the resulting dfs to keep the name of its last column + ".txt". So, the first three df will be called "v1.txt", "453.txt" and "73v.txt".

This post is somehow similar to: Split huge file into n files keeping first 7 columns + next 3 columns until column n

but I am not able to adapt it.

Lucas
  • 1,139
  • 3
  • 11
  • 23

1 Answers1

1

You can proceed as follows:

import pandas as pd
import numpy as np
np.random.seed(42)

df = pd.DataFrame({'A': np.random.randint(0, 100, 10),
                   'B': np.random.randint(0, 100, 10),
                   'C': np.random.randint(0, 100, 10),
                   'D': np.random.randint(0, 100, 10),
                   'F': np.random.randint(0, 100, 10),
                   'G': np.random.randint(0, 100, 10),
                   'H': np.random.randint(0, 100, 10),
                   'v1': np.random.randint(0, 100, 10),
                   '453': np.random.randint(0, 100, 10),
                   '73v': np.random.randint(0, 100, 10)})

for i in range(7, df.shape[1]):
    sub_df = df.iloc[:, np.r_[0:7, i]]
    sub_df.to_csv(f'{df.columns[i]}.txt', sep='\t')
David M.
  • 4,518
  • 2
  • 20
  • 25