I have a tab separated file like this example:
small example:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I want to group the lines based on 5th
, 6th
and 7th
columns and sum the values of 4th
column in each group.
here is the expected output:
expected output:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I am trying to do that in python using the following command but it does not really work. do you know how to fix it?
import pandas as pd
df = pd.read_csv('myfile.txt', sep='\t', header=None)
df = df.groupby(5, 6, 7, 8).sum()