How to join each 2 rows with 2 column in one row in python?

Question

I am trying to join each 2 rows with 2 columns into one column.

I have a data like that, And it is stored in a text file

7.0 1042.3784354104064 1041.8736266399212 0.0
7.0 567.603384919274 566.8152346188947 0.0
8.0 709.5076838990026 709.3588638367074 0.0
8.0 386.811514883702 386.6412338380912 0.0

The expected output will be like that

1042.3784354104064 1041.8736266399212 567.603384919274 566.8152346188947
709.5076838990026 709.3588638367074 386.811514883702 386.6412338380912

What format is this data in? Nested lists, a pandas dataframe, ...? — Patrick Haugh, Apr 12 '19 at 19:23
Your expected output isn't a transpose. It looks like you want to take all of the rows that share a first column and join their second and third columns into a list? — Patrick Haugh, Apr 12 '19 at 19:25
Yes, this is what i want. Sorry i think it will be like transposing — Hadeer Zayat, Apr 12 '19 at 19:27

Patrick Haugh · Answer 1 · 2019-04-12T20:26:20.310

1

You can create a dictionary mapping your first column values to lists, and then populate those lists as you iterate through your matrix:

from collections import defaultdict

matrix = [[7.0, 1042.3784354104064, 1041.8736266399212, 0.0],
[7.0, 567.603384919274, 566.8152346188947, 0.0],
[8.0, 709.5076838990026, 709.3588638367074, 0.0],
[8.0, 386.811514883702, 386.6412338380912, 0.0]]

dd = defaultdict(list)

for key, *values, discard in matrix:
    dd[key].extend(values)

result = list(dd.values())

print(result)
# [[1042.3784354104064, 1041.8736266399212, 567.603384919274, 566.8152346188947], 
#  [709.5076838990026, 709.3588638367074, 386.811514883702, 386.6412338380912]]

Here's a pure numpy solution based on this answer

import numpy as np

mat = np.loadtxt('file.txt')

indices = np.cumsum(np.unique(mat[:, 0], return_counts=True)[1])[:-1]

result = np.array(np.split(mat[:, 1:-1], indices)).reshape((len(indices)+1, -1))
print(result)
# [[1042.37843541 1041.87362664  567.60338492  566.81523462]
#  [ 709.5076839   709.35886384  386.81151488  386.64123384]]

edited Apr 12 '19 at 20:26

answered Apr 12 '19 at 19:31

Patrick Haugh

59,226
13
88
96

it works but my original data are stored in a text file. Another thing is that the output prints the second row as the first one, and the expected first row as the second one – Hadeer Zayat Apr 12 '19 at 19:54
@HadeerZayat What is your expected order? First appearance of the key in the file, sorted by key, the data in the file is already grouped? – Patrick Haugh Apr 12 '19 at 19:57
when i tried your code it gives me the following output ""[709.5076838990026, 709.3588638367074, 386.811514883702, 386.6412338380912]]"" then it prints ""[1042.3784354104064, 1041.8736266399212, 567.603384919274, 566.8152346188947]"" i need it to be # [[1042.3784354104064, 1041.8736266399212, 567.603384919274, 566.8152346188947], # [709.5076838990026, 709.3588638367074, 386.811514883702, 386.6412338380912]] – Hadeer Zayat Apr 12 '19 at 20:07
1

@HadeerZayat I've added a pure numpy solution that should also preserve order. – Patrick Haugh Apr 12 '19 at 20:26

score 0 · Answer 2 · answered Apr 12 '19 at 19:27

The following code will transpose a list of lists, which I believe is what you're asking for. You can trim this new_data if you are looking to remove some rows.

raw_data = [[7.0, 1042.3784354104064, 1041.8736266399212, 0.0],
[7.0, 567.603384919274, 566.8152346188947, 0.0],
[8.0, 709.5076838990026, 709.3588638367074, 0.0],
[8.0, 386.811514883702, 386.6412338380912, 0.0]]

new_data = []
for i, data in enumerate(raw_data):
    for j, d in enumerate(data): 
        if(i==0):
            new_data.append([])
        new_data[j].append(d)

print(new_data)

How to join each 2 rows with 2 column in one row in python?

2 Answers2