0

I'm creating a new DataFrame from another DataFrame. I have the first and last row in a different variables each one. What do I need to extract this rows to create a new DataFrame?

  1. I already checked the variables are right but when I created the new DataFrame and used .describe() but then the DataFrame is empty.
  2. I tried replacing the variables for the actual dates and in this case one of the statistics is wrong because the output for first is wrong.

FIRST TRY

corrected_log = git_log.loc['first_commit_timestamp':'last_commit_timestamp', : ]

print(corrected_log.describe())

OUTPUT

                  timestamp               author
count                 0                     0

unique                0                     0

SECOND TRY

corrected_log = git_log.loc['2005-04-16 22:20:36':'2019-04-05 05:07:45', : ]

OUTPUT

                                   timestamp                   author

count                                  1400                      1400

unique                                 1357                       393

top                         2014-12-11 23:56:04          Benjamin Romer

freq                                    15                         46

first                       2013-10-02 14:56:14                    NaN

last                        2015-08-01 10:03:00                    NaN      

What I expected the output of first date '2005-04-16 22:20:36' but the actual output is '2013-10-02 14:56:14'. It seems that the first row is wrong in the new DataFrame.

Community
  • 1
  • 1
  • 1
    [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Trenton McKinney Aug 31 '19 at 01:56
  • it must be `git_log.loc[first_commit_timestamp:last_commit_timestamp, : ]` (i.e. without quotes) instead of `git_log.loc['first_commit_timestamp':'last_commit_timestamp', : ]` – Stef Aug 31 '19 at 10:24
  • also you'll have to check your data types: right now it looks like your timestamp column is of `str` type, but you'll probably want to have a datetime type column (see [`to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) for conversion) – Stef Aug 31 '19 at 10:28
  • Just tested data types for 'first_commit_timestamp' and 'last_commit_timestamp' and this is the answer . There must be something obvious... Thank you for your help. – luliloop Aug 31 '19 at 14:21
  • @Stef I also tried without quotes git_log.loc['first_commit_timestamp':'last_commit_timestamp', : ] but this is what appeared: KeyError: Timestamp('2005-04-16 22:20:36') – luliloop Aug 31 '19 at 15:56
  • this tell us that the given Timespamp('2005-04-16 22:20:36') is not in your index. Without some concrete example data it is almost impossible to give you a working solution, see @Trenton_M's comment above. – Stef Aug 31 '19 at 19:28
  • @Stef Thank you!! I think that you're right. Maybe the problem is not in this part of the code. This is the first time that I ask a question and this is my first project. You really help me. I'll try again and maybe ask later but with more info. – luliloop Aug 31 '19 at 21:55

0 Answers0