0

When importing with read_csv, I added the first 2 rows as headers in my dataframe so I have what I think is an Object as all my column names.

For example when I type

df.iloc[:,0] 

I get the following to be returned.

Name: (RECORD, Unnamed: 0_level_1), Length: 36000, dtype: int64

I'm unsure how to use

df.LOC["ColName" = "value"]

Usually for ColName I'd put a single string that is usually the column name. I'm unsure for what do I put for ColName ?

Edit:

>>>> df.head(2).to_dict()
{('RECORD', 'Unnamed: 0_level_1'): {0: 1, 1: 2}, ('TIME/DATE', 'Unnamed: 
1_level_1'): {0: '19:55:48.52  09/30/22  ', 1: '19:55:48.62  09/30/22  
'}, ('ALARM OR MESSAGE', 'Unnamed: 2_level_1'): {0: 'NO ALARMS                   
', 1: 'NO ALARMS'} 

I tried to convert

df['TIME']= pd.to_datetime(df['TIME/DATE'],, format='%m/%d/%Y %H:%M:%S', infer_datetime_format=True)

But I keep getting an error:

ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

I believe it's because the Heading is an object.

user3042850
  • 323
  • 1
  • 3
  • 15

1 Answers1

1

Your issue is that you have set up a multi-level column index by specifying the first 2 rows of your CSV file as headers, however it seems the second row didn't have any useful information in it (hence the Unnamed... column names). You can get rid of that using droplevel:

df = df.droplevel(level=1,axis=1)

This gives:

   RECORD                TIME/DATE ALARM OR MESSAGE
0       1  19:55:48.52  09/30/22          NO ALARMS
1       2    19:55:48.62  09/30/22        NO ALARMS

You should then clean up the extra spaces on your TIME/DATE column before processing with the appropriate format:

df['TIME/DATE'] = df['TIME/DATE'].str.strip()
df['TIME']= pd.to_datetime(df['TIME/DATE'], format='%H:%M:%S.%f  %m/%d/%y')

Output:

   RECORD              TIME/DATE ALARM OR MESSAGE                    TIME
0       1  19:55:48.52  09/30/22        NO ALARMS 2022-09-30 19:55:48.520
1       2  19:55:48.62  09/30/22        NO ALARMS 2022-09-30 19:55:48.620

If you need the multi-level index, you can use df.loc to access the column values:

df.loc[:, ('TIME/DATE', 'Unnamed:1_level_1')] = df.loc[:, ('TIME/DATE', 'Unnamed:1_level_1')].str.strip()
df['TIME'] = pd.to_datetime(df.loc[:, ('TIME/DATE', 'Unnamed:1_level_1')], format='%H:%M:%S.%f  %m/%d/%y')

Output:

              RECORD              TIME/DATE   ALARM OR MESSAGE                    TIME
  Unnamed: 0_level_1      Unnamed:1_level_1 Unnamed: 2_level_1
0                  1  19:55:48.52  09/30/22          NO ALARMS 2022-09-30 19:55:48.520
1                  2  19:55:48.62  09/30/22          NO ALARMS 2022-09-30 19:55:48.620
Nick
  • 138,499
  • 22
  • 57
  • 95
  • I need that second row later on in the dataset - is there a way to just concatenate the two entries into one entry? – user3042850 May 05 '23 at 02:34
  • @user3042850 see my edit to retain the multi-level index; or you can see https://stackoverflow.com/questions/24290297/pandas-dataframe-with-multiindex-column-merge-levels for merging the levels – Nick May 05 '23 at 03:42