0

I have 1500 people's data, and each people has about 10000 features, and each feature has a value. There is dictionary called dict_f, the key of dict_f= {'name': value, 'f1': value, 'f2': value, .......} are the features of people. for example:

name    f1     f2     f3    f4  ............
name1   1      2      3     4
name2   1.1    2.1    3.1   4.1
...............................

I want to write these data to a cvs file, and then in another code file, I want to write the cvs file to a data frame with pandas. But I found that the time of writing every people value(the dict_f, please notice that the value of feature for each people is different) was taken about 1.4s. So 1500 people's data will taken about 1500*1.4. So the time is much, I want to reduce the time and improve the speed of writing data to csv.

Some part of code is following(please notice that the lst_field_names_0 are the features name list):

with open('data/feature_data_0_0.csv', mode='wt', encoding='utf-8') as outfile:
    fieldnames = lst_field_names_0
    writer = csv.DictWriter(outfile, fieldnames, restval='""', dialect=csv.unix_dialect)
    writer.writeheader()
    for i in range(0, len(name_list), 1):
         writer.writerow(dict_f)

Then I want to use the pandas to read the cvs file,

 feature_dataframe = pd.read_csv('data/feature_data_0_0.csv')

Could you help me to improve the speed of writing to cvs file.

Thanks in advance!

tktktk0711
  • 1,656
  • 7
  • 32
  • 59
  • "There is dictionary called dict_f, the key of dict_f are the features of people.". Do you mean to say that there is a list that contains 1500 instances of `dict_f`? At the moment, you're just writing 1500 identical feature sets to file (i.e., all the people are the same). –  Nov 11 '16 at 04:32
  • Thanks for @Evert comment. please see the updated questions. Although all the people have same number and name of features, but the value is different. – tktktk0711 Nov 11 '16 at 05:36
  • It's still not clear whether you have a list of dicts, or whether each value inside your dict is a list (of length 1500). –  Nov 11 '16 at 05:52
  • Why go through a file? Convert your dict or list of dicts straight to a pandas DataFrame. You can always write that to a CSV file in just one line. –  Nov 11 '16 at 05:53
  • There is dict for each people's features(10000)' value: dict_f = {'name': n, 'f1':value1, 'f2': value..........}. Why I said 1500 times, since there 1500 people. In the every loop, we compute the value of features for each people (I haven't explain this in detail since the code is a lot), after the value of features for each people are obtained, then write this dict_f to the cvs file. – tktktk0711 Nov 11 '16 at 06:02
  • If you @Evert don' know exactly the meaning of this question, just direct message to me – tktktk0711 Nov 11 '16 at 06:03
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/127860/discussion-between-tktktk0711-and-evert). – tktktk0711 Nov 11 '16 at 06:04
  • "we compute the value of features for each people": as you don't show that, and instead show a variable that *doesn't change* being written to file, confusion arises to what exactly `dict_f` is and what is written to disk. Still, just bypass createing a CSV file and create your DataFrame directly, by appending each computed dict to a list. See [convert list of dictionaries to dataframe](http://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-dataframe). –  Nov 11 '16 at 06:14
  • thanks for your comments, I have checked the question you mentioned. In addition, I have use this method, I found it is more faster than my current method.You can answer this question. But I have another questions, I will post later, please give me some advices. – tktktk0711 Nov 11 '16 at 06:25

0 Answers0