0

I am trying to come up with a way to assign values to dates in my panda's data frame. I currently have a list of dates grouped by "state" as shown in the table. I would like to have "03-01-2020" be "day 1" and the current day be the last day-- in a column called "day". I tried using this code to create the dictionary I could use in the Pandas dataframe with the code below:

counter = 0 
dict1 = {}
for i in df.date:
  if i not in dict1:
    counter +=1
    dict1[i] = counter
dict1

This code was to assign a number to a date. I then tried the following code to match the date column with its corresponding number. I tried putting it in a list so I can eventually make a column with this list and append it to the data frame

list1 = []
for x in df.date:
  for key,value in enumerate(dict1):
    if value == x:
      list1.append(key)

print(list1)

However, when I tried the code I got a large list just iterating 0 to 360 everytime. This is not what I wanted. I wanted that if a state (remember I grouped the df by state) started recording data at different days the number would correspond with the date.

Ex: if Texas started recording at 03-01-2020 it would be 1st day in the "Day" Column However, if Arizona started recording at 03-05-2020, it would be the 5th day in "Day" Column

Data Frame Looks like this:


Date      State  Cases 
2020-03-01 AK   15.5000 
2020-03-02 AK   28.4048 
2020-03-03 AK   43.8333 
2020-03-04 AK   60.8905 
2020-03-05 AK   81.3548 

Dylan
  • 39
  • 6
  • Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Feb 26 '21 at 04:42
  • Fixed. Thank you! – Dylan Feb 26 '21 at 05:01

1 Answers1

0

The following code:

import pandas as pd

df = pd.DataFrame(
    [
        ['2020-03-01', 'AK', 0, 15.5000],
        ['2020-03-02', 'AK', 0, 28.4048],
        ['2020-03-03', 'AK', 0, 43.8333],
        ['2020-03-04', 'AK', 0, 60.8905],
        ['2020-03-10', 'AK', 0, 81.3548],
        ['2020-03-02', 'TX', 0, 15.5000],
        ['2020-03-08', 'TX', 0, 28.4048],
        ['2020-03-05', 'TX', 0, 43.8333],
        ['2020-03-10', 'TX', 0, 60.8905],
        ['2020-03-11', 'TX', 0, 81.3548]
    ],
    columns=['date', 'state', 'Deaths', 'Cases']
)

df['date'] = pd.to_datetime(df['date'])

print (df.head(20))

df['Day'] = df.groupby('state')['date'].rank(ascending=True)

print (df.head(20))

Produces the following output:

        date state  Deaths    Cases
0 2020-03-01    AK       0  15.5000
1 2020-03-02    AK       0  28.4048
2 2020-03-03    AK       0  43.8333
3 2020-03-04    AK       0  60.8905
4 2020-03-10    AK       0  81.3548
5 2020-03-02    TX       0  15.5000
6 2020-03-08    TX       0  28.4048
7 2020-03-05    TX       0  43.8333
8 2020-03-10    TX       0  60.8905
9 2020-03-11    TX       0  81.3548
        date state  Deaths    Cases  Day
0 2020-03-01    AK       0  15.5000  1.0
1 2020-03-02    AK       0  28.4048  2.0
2 2020-03-03    AK       0  43.8333  3.0
3 2020-03-04    AK       0  60.8905  4.0
4 2020-03-10    AK       0  81.3548  5.0
5 2020-03-02    TX       0  15.5000  1.0
6 2020-03-08    TX       0  28.4048  3.0
7 2020-03-05    TX       0  43.8333  2.0
8 2020-03-10    TX       0  60.8905  4.0
9 2020-03-11    TX       0  81.3548  5.0

Process finished with exit code 0

What you look for is groupby and rank. Please be aware of the datatype of date column - it has to be of a numerical type (datetime) for ranking.

Docs for rank Docs for groupby

Rikki
  • 3,338
  • 1
  • 22
  • 34