how to read csv file and extract specific coulmn?

Question

This is my csv file :

CommitId                                RefactoringType      RefactoringDetail
d38f7b334856ed4007fb3ec0f8a5f7499ee2f2b8    Pull Up Attribute   "Pull Up Attribute  protected steps : int from class blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm to class blokusgame.mi.android.hazi.blokus.GameLogic.Player"
d38f7b334856ed4007fb3ec0f8a5f7499ee2f2b8    Pull Up Attribute   "Pull Up Attribute  protected steps : int from class blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm to class blokusgame.mi.android.hazi.blokus.GameLogic.Player"
d38f7b334856ed4007fb3ec0f8a5f7499ee2f2b8    Pull Up Attribute   "Pull Up Attribute  protected steps : int from class blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm to class blokusgame.mi.android.hazi.blokus.GameLogic.Pla

I need to extract this:

RefactoringDetail
"Pull Up Attribute  protected steps : int from class blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm to class blokusgame.mi.android.hazi.blokus.GameLogic.Player"
"Pull Up Attribute  protected steps : int from class blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm to class blokusgame.mi.android.hazi.blokus.GameLogic.Player"
"Pull Up Attribute  protected steps : int from class blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm to class blokusgame.mi.android.hazi.blokus.GameLogic.Player"

I tried this code:

import pandas as pd
df = pd.read_csv('result_refactorings.csv', sep='delimiter', header=None)
df.iloc[:,-1]

it return all the data

Any help please!

How about `pd.read_csv('result_refactorings.csv', sep='delimiter', usecols=['RefactoringDetail'])`? — Chris, May 29 '19 at 06:53
duplicate: https://stackoverflow.com/questions/19486369/extract-csv-file-specific-columns-to-list-in-python — Zaraki Kenpachi, May 29 '19 at 06:54
Possible duplicate of [Extract csv file specific columns to list in Python](https://stackoverflow.com/questions/19486369/extract-csv-file-specific-columns-to-list-in-python) — skillsmuggler, May 29 '19 at 06:56
@Chris I tried it and it return columns expected but not found: ['RefactoringDetail'] — Henda Drid, May 29 '19 at 06:59

Zhenhir · Answer 1 · 2019-05-29T07:22:56.750

1

If you wanted to just use the inbuilt csv module:

import csv
import re
third_column = []
with open("result_refactorings.csv") as csvfile:
    fixed_spaces = [re.sub(" {2,}","\t",x) for x in csvfile]
    reader = csv.DictReader(fixed_spaces, delimiter="\t")
    for row in reader:
        print(row["RefactoringDetail"])
        third_column.append(row["RefactoringDetail"])

This code both prints out the third column and adds each item in the third column to a list third_column.. take out one or the other depending on what you wanna do.

EDIT: On closer inspection it seems your csv input is delimited with an uneven number of spaces.. and not actually tabs, which is what it looks like.. Added a little regex to replace 2 or more concurrent spaces with an actual tab.. since in its current state it isn't a valid csv.

edited May 29 '19 at 07:22

answered May 29 '19 at 07:16

Zhenhir

1,157
8
13

thanks but it return this error: KeyError: 'RefactoringDetail' and it's a commun error for all the codes that i tried – Henda Drid May 29 '19 at 07:28
Did you try it with the Regex that I edited in? It didn't work before I did that for me.. make sure reader is running on `fixed_spaces`. – Zhenhir May 29 '19 at 07:29
Alternately.. can you show me what one row looks like when you just do `print(row)` instead? – Zhenhir May 29 '19 at 07:31
it return also an error : Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? – Henda Drid May 29 '19 at 07:39
I pasted your example directly from this page and I didn't need to do that.. There must be some issue that didn't carry across. can you see what happens if you just run `print(open("result_refactorings.csv").readlines()[0])`? – Zhenhir May 29 '19 at 07:58
it return this : CommitId;RefactoringType;RefactoringDetail – Henda Drid May 29 '19 at 08:07

Toni Sredanović · Accepted Answer · 2019-05-29T08:17:38.053

0

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv('test.csv', sep=';')
refactoring_details = df['RefactoringDetail']
print(refactoring_details)

Edit: separator in provided file is ; instead of the default ,.

edited May 29 '19 at 08:17

answered May 29 '19 at 07:01

Toni Sredanović

2,280
1
11
13

thanks but it return this error: 'DataFrame' object has no attribute 'RefactoringDetail' – Henda Drid May 29 '19 at 07:28
Could you upload the file somewhere and link it? – Toni Sredanović May 29 '19 at 07:46
this is the link: https://filebin.net/w1elk0xp7b16v5b4 thanks for help – Henda Drid May 29 '19 at 08:03
Oh so what you were missing is that your separator is `;` instead of the default `,`. I've edited my answer and it should now work with your file. – Toni Sredanović May 29 '19 at 08:16
Glad i could help :) – Toni Sredanović May 29 '19 at 08:24

how to read csv file and extract specific coulmn?

2 Answers2