Convert csv file with python

Question

I'm new into python, does somebody have an idea what would be a good approach? I could just script it, but it's probably faster to use a package.

I have this .csv file (gigabytes large):

name,   value,  time
A,   1, 10
B,   2, 10
C,   3, 10
C,   3, 10 (should ignore duplicates, or non complete (A,B,C) entries
A,   4, 12 (should be sorted by time, this entry should be at the end, after time==11)
B,   5, 12
C,   6, 12
B,   7, 11 (order of A,B,C might be different)
C,   8, 11
A,   9, 11

convert it to a new .csv file containing:

time,   A,  B,  C
10, 1,  2,  3
11, 9,  7,  8
12, 4,  5,  6

A good approach would be to research how you can parse CSV with python, and figure out an algorithm that will do what you want. Hope this helps! — N. Ivanov, Apr 10 '18 at 13:10
Aside from filtering, the operation you're trying to do is [converting long-form data to wide](https://stackoverflow.com/questions/22798934/pandas-long-to-wide-reshape). — caw5cv, Apr 10 '18 at 13:12

score 6 · Accepted Answer · answered Apr 10 '18 at 13:11

6

I think need drop_duplicates with pivot:

df = df.drop_duplicates().pivot('time','name','value')
print (df)
name  A  B  C
time         
10    1  2  3
11    9  7  8
12    4  5  6

answered Apr 10 '18 at 13:11

jezrael

822,522
95
1,334
1,252

1

Thanks, very useful! I was stuck at just reading the csv file. Together with the addition of @divyang this solved my question. – Sheldon Apr 10 '18 at 15:05
3

Very useful command this pivot(), the documentation has a very similar example https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html – Sheldon Apr 10 '18 at 15:22
@Sheldon Glad can help! – jezrael Apr 10 '18 at 15:24

score 2 · Answer 2 · answered Apr 10 '18 at 13:55

Since I can't comment I would like to add to @jezrael answer that you would also want to drop incomplete or NaN values. By using df.dropna

import numpy as np
import pandas as pd
A = 'a'
B = 'b'
C = 'c'
df = pd.DataFrame([[A,   1, 10],
                [B,   2, 10],
                [C,   3, 10],
                [C,   3, 10],
                [A,   4, 12],
                [B,   5, 12],
                [C,   6, 12],
                [B,   7, 11],
                [C,   8, 11],
                [A,   9, 11],
                [np.nan, 10, 0]], columns = ["name","value", "time"])
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
df = df.pivot('time','name','value')
print(df)

Convert csv file with python

2 Answers2