0

I have the below data in a csv and I am trying to create a dataframe of 1 column by selecting each column from the csv at a time.

sv_m1   rev     ioip    
0       15.31   40      
0       64.9    0       
0       18.36   20      
0       62.85   0       
0       10.31   20      
0       12.84   10      
0       69.95   0       
0       32.81   20  

The list that I get, the first value is the column name and remaining are values.

input_file = open('df_seg_sample.csv', 'r')
c_reader = csv.reader(input_file, delimiter=',')
#Read column
column = [x[1] for x in c_reader]
label = column[0]
column = column[1:]
df_column = pd.DataFrame.from_records(data = column,columns = label)

However this gives me an error:

  TypeError: Index(...) must be called with a collection of some kind, 'sv_m1' was passed

core is actually the column name.

How can I create this df? The column name of the df will be the first element in the list and all other items in the list will be the column values.

The reason for not using pandas.read_csv is: The dataframe is huge and hogs up a lot of memory. So I want to read in a column at a time, do some processing and write it to another csv.

Shuvayan Das
  • 1,198
  • 3
  • 20
  • 40
  • what is your expected output – Pyd May 24 '18 at 05:18
  • any reason not to read the csv with `pandas`, e.g. `pd.read_csv('df_seg_sample.csv')` – AChampion May 24 '18 at 05:19
  • Hello @AChampion. The df is huge and uses a lot of memory if I store it all together in a df. So I want to read in a column at a time, do some processing and then move onto the next column. Thanks! – Shuvayan Das May 24 '18 at 05:25
  • Hello @jezrael.. I am sorry.. There is no problem anywhere sir..I just loose track at times. I am sorry if you felt otherwise. Thanks a lot! – Shuvayan Das May 30 '18 at 07:51
  • @jezrael.. Thanks.. Do you mean https://stackoverflow.com/questions/50598383/process-subset-of-data-based-on-variable-type-in-python this question? In case yes it works for me. However I have posted only a small sample of data for reproducability. – Shuvayan Das May 30 '18 at 07:59
  • I think with your sample data get error, i cannot run your sample. – jezrael May 30 '18 at 08:00
  • @jezrael.. Update the question with code to generate data. Please check! – Shuvayan Das May 30 '18 at 08:23
  • I change `df_column = df_data_sample` and run and get `ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series`. Do you use pandas `0.23.0` ? – jezrael May 30 '18 at 08:55
  • I get error in `df_final[target_column] = df_target_attribute[[target_column]]` – jezrael May 30 '18 at 08:56
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/172047/discussion-between-shuvayan-das-and-jezrael). – Shuvayan Das May 30 '18 at 09:16
  • My email is in my profile. – jezrael May 30 '18 at 10:05

1 Answers1

1

I think need read_csv here with usecols parameter for filter second column:

df = pd.read_csv('df_seg_sample.csv', usecols=[1])
print (df)
     rev
0  15.31
1  64.90
2  18.36
3  62.85
4  10.31
5  12.84
6  69.95
7  32.81

But if want use your solution is necssary add [] for one item list for column name and use only DataFrame contructor:

data = [x[1] for x in c_reader]
print (data)
['rev', '15.31', '64.9', '18.36', '62.85', '10.31', '12.84', '69.95', '32.81']

df = pd.DataFrame(data[1:], columns=[data[0]])
print (df)
     rev
0  15.31
1   64.9
2  18.36
3  62.85
4  10.31
5  12.84
6  69.95
7  32.81
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252