0

I am confused where the error creeps in as the deployed code produced integers and Boolean in the same column (nutrition below) as a result of the code below. It does not occur in small data in testing. What can happen here?

In months where no LopNr had more than 1 in the sum, pandas did not convert the True to 1? Why not? In any case, is it safe to manually override the end result this way?

The data has rows with the relevant columns being like this:

LopNr      DIAGNOS     INDATUMA
    1      E12 E14     20050705

The code is:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

all_treatments = list()
filelist = ['file1']

nutrition_codes = '|'.join(["D{}".format(i) for i in range(50, 54)] +  ["E{}".format(i) for i in range(10, 15)] + ["E{}".format(i) for i in range(40, 47)] +  ["E{}".format(i) for i in range(50, 69)])

for file in filelist:
    filename = 'PATH/' + file +'.txt'
    if file[0]=='o':
        treatments = pd.read_table(filename,usecols=[0,8,10])
    elif file[0]=='s':
        treatments = pd.read_table(filename,usecols=[0,8,11])
    else:
        print "file should start with s or o, no?"
    all_treatments.append(treatments)

all_treatments = pd.concat(all_treatments, ignore_index=True)
all_treatments['date'] = pd.to_datetime(all_treatments['INDATUMA'].astype(str), coerce=True)
all_treatments['year'] = all_treatments['date'].dt.year
all_treatments['month'] = all_treatments['date'].dt.month
all_treatments['nutrition'] = all_treatments.DIAGNOS.str.contains(nutrition_codes)
all_treatments = all_treatments.drop(['DIAGNOS','INDATUMA','date'], axis=1)
all_treatments = all_treatments.groupby(['LopNr','year','month']).sum().astype(int,copy=False,raise_on_error=False)
all_treatments.to_csv('PATH/treatments_monthly.csv')
ekad
  • 14,436
  • 26
  • 44
  • 46
László
  • 3,914
  • 8
  • 34
  • 49
  • Please, minimal/complete/verifiable example http://stackoverflow.com/help/mcve and http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples There are many lines of code here, of which only a couple are obviously relevant to the question. Also, the one line of sample data is good, but even better is to provide the minimal amount necessary to reproduce the error (i.e. minimal rows of data could be 1, but it could be 20) – JohnE Jul 31 '15 at 13:18
  • In summary, please provide more data (but the minimal amount, and that can be copied and pasted) and less code. – JohnE Jul 31 '15 at 13:22
  • @JohnE I agree with you on principle. Yet I told you that small test data did not reproduce the error. If I could reverse engineer data that does, probably I could also resolve the issue myself. Also if I could be sure which line were relevant. – László Aug 11 '15 at 12:23

0 Answers0